The Chunking Decision That Makes or Breaks RAG Retrieval Quality

Chris Harper

2 min read

Jun 23, 2026 · 21:07 UTC

Workflow

RAG

Best Practices

TL;DR: Chunk size and strategy — not model choice — are the biggest lever on RAG retrieval quality. Start with recursive 256–512 tokens; measure before adding complexity.

When you add documents to a vector database, you almost never add them whole. A 20-page PDF as a single embedding captures an average of everything — not the specific passage you need at retrieval time. You chunk first so each stored vector represents a coherent, bounded piece of meaning.

The core parameters

Chunk size. 256–512 tokens is the validated starting point for most retrieval tasks. Shorter chunks (128) help factoid queries ("What's the API rate limit?"); longer chunks (1024+) help analytical ones ("Explain how auth works"). Pick one, measure, adjust.

Overlap. 10–20% overlap (50–100 tokens on a 512-token chunk) prevents context from being severed at a boundary. However, a January 2026 systematic analysis found no measurable benefit in some retrieval settings — test before assuming the storage cost is worth it.

Strategy. Three approaches in order of complexity:

Fixed-size — split every N characters regardless of sentence structure. Simple baseline, but splits sentences mid-thought.
Recursive — respects natural boundaries (paragraph → sentence → character). Preserves coherent meaning. Use this as your default.
Semantic — clusters adjacent sentences by embedding similarity before finalizing boundaries. Up to ~70% retrieval lift on benchmarks vs naive fixed-split; costs more CPU at indexing time; add it after you've established a baseline.

Code (LangChain + ChromaDB)

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,
    separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_text(raw_text)

Pass chunks directly to collection.add(documents=chunks, ids=[…]). Always add a metadata dict per chunk with the source file path — this is what lets you surface [Source: docs.md, section: auth] citations in your generated answer.

The rule of thumb

Recursive 256–512 tokens gets you 80% of the way there. The 2026 techniques (contextual retrieval, late chunking, cross-granularity indexing) deliver the remaining lift — but they work on top of a well-chunked baseline, not instead of one.

Sources: Weaviate: Chunking Strategies for RAG, Firecrawl: Best Chunking Strategies 2026, Unstructured.io: Chunking for RAG Best Practices

CloudCodeTree

The Chunking Decision That Makes or Breaks RAG Retrieval Quality

The core parameters

Code (LangChain + ChromaDB)

The rule of thumb