
The Chunking Decision That Makes or Breaks RAG Retrieval Quality
Chris Harper
2 min read
Jun 23, 2026 · 21:07 UTC
TL;DR: Chunk size and strategy — not model choice — are the biggest lever on RAG retrieval quality. Start with recursive 256–512 tokens; measure before adding complexity.
When you add documents to a vector database, you almost never add them whole. A 20-page PDF as a single embedding captures an average of everything — not the specific passage you need at retrieval time. You chunk first so each stored vector represents a coherent, bounded piece of meaning.
The core parameters
Chunk size. 256–512 tokens is the validated starting point for most retrieval tasks. Shorter chunks (128) help factoid queries ("What's the API rate limit?"); longer chunks (1024+) help analytical ones ("Explain how auth works"). Pick one, measure, adjust.
Overlap. 10–20% overlap (50–100 tokens on a 512-token chunk) prevents context from being severed at a boundary. However, a January 2026 systematic analysis found no measurable benefit in some retrieval settings — test before assuming the storage cost is worth it.
Strategy. Three approaches in order of complexity:
- Fixed-size — split every N characters regardless of sentence structure. Simple baseline, but splits sentences mid-thought.
- Recursive — respects natural boundaries (paragraph → sentence → character). Preserves coherent meaning. Use this as your default.
- Semantic — clusters adjacent sentences by embedding similarity before finalizing boundaries. Up to ~70% retrieval lift on benchmarks vs naive fixed-split; costs more CPU at indexing time; add it after you've established a baseline.
Code (LangChain + ChromaDB)
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=64,
separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_text(raw_text)
Pass chunks directly to collection.add(documents=chunks, ids=[…]). Always add a metadata dict per chunk with the source file path — this is what lets you surface [Source: docs.md, section: auth] citations in your generated answer.
The rule of thumb
Recursive 256–512 tokens gets you 80% of the way there. The 2026 techniques (contextual retrieval, late chunking, cross-granularity indexing) deliver the remaining lift — but they work on top of a well-chunked baseline, not instead of one.
Sources: Weaviate: Chunking Strategies for RAG, Firecrawl: Best Chunking Strategies 2026, Unstructured.io: Chunking for RAG Best Practices