Better RAG: Use Metadata Filters to Scope Your Search Before Similarity Runs

Chris Harper

4 min read

Jun 25, 2026 · 12:04 UTC

Tutorial

RAG

Embeddings

Best Practices

TL;DR: Stop searching your entire document store — use metadata filters to scope to the right subset before similarity runs, so you get faster and more accurate retrieval with one extra parameter.

What you'll be able to do after this:

Attach metadata fields (category, date, source, user) to your embeddings and filter them at query time using WHERE-style conditions
Implement pre-filtering in ChromaDB with the where parameter — no extra library needed
Combine metadata filters with vector search for queries like "most relevant chunk from Q4 2024 finance docs" instead of searching everything every time

Why this matters

From the last two posts: hybrid search gives you better recall; reranking gives you better precision. But neither fixes a more fundamental problem: you're searching all your documents on every query.

A 1M-chunk store with content from 50 different sources should not run an HR question through finance documents. A question about Q4 2024 numbers should not surface docs from 2022. Without filtering, you get noise crowding out signal, slower latency, and answers sourced from the wrong place.

Metadata filtering is the WHERE clause your vector search is missing — and it's built into every major vector database.

Step 1: Attach metadata at ingest

import chromadb
from sentence_transformers import SentenceTransformer

client = chromadb.Client()
collection = client.create_collection("docs")
encoder = SentenceTransformer('all-MiniLM-L6-v2')

documents = [
    "The Q4 2024 revenue was $2.3M, up 18% YoY.",
    "The company PTO policy allows 15 days annually.",
    "Q3 2024 operating expenses totaled $1.1M.",
    "Remote work requires manager approval for trips over 2 weeks.",
]
metadatas = [
    {"category": "finance", "quarter": "Q4-2024"},
    {"category": "hr"},
    {"category": "finance", "quarter": "Q3-2024"},
    {"category": "hr"},
]

collection.add(
    documents=documents,
    embeddings=encoder.encode(documents).tolist(),
    metadatas=metadatas,
    ids=[f"doc-{i}" for i in range(len(documents))],
)

Step 2: Filter at query time

query = "What was Q4 2024 revenue?"

results = collection.query(
    query_embeddings=encoder.encode([query]).tolist(),
    n_results=3,
    where={"category": {"$eq": "finance"}},   # <-- the WHERE clause
)
print(results["documents"])
# → ['The Q4 2024 revenue was $2.3M...', 'Q3 2024 operating expenses...']
# HR documents never appear — they were excluded before ANN ran.

The where parameter runs pre-filtering: ChromaDB applies it before approximate nearest-neighbor search, not after. Only the matching subset is searched — faster and more precise.

Filter operators (works the same across ChromaDB, Pinecone, Qdrant)

Operator	Example
`$eq`	`{"category": {"$eq": "finance"}}`
`$ne`	`{"source": {"$ne": "archive"}}`
`$gt` / `$gte` / `$lt` / `$lte`	`{"year": {"$gte": 2024}}`
`$in`	`{"quarter": {"$in": ["Q3-2024", "Q4-2024"]}}`
`$and`	`{"$and": [{"category": {"$eq": "finance"}}, {"year": {"$gte": 2024}}]}`

Pre-filtering vs post-filtering

Pre-filtering (default in ChromaDB, Pinecone, Qdrant): filter first, run ANN on the subset. Accurate; can return fewer than n_results if the subset is small.
Post-filtering: run ANN across everything, filter after. Risk of returning nothing if matching docs don't appear in top-k. Fails at scale.

Pre-filtering is almost always what you want for RAG. Pinecone's "The Missing WHERE Clause in Vector Search" (linked below) has a clear visual of why post-filtering breaks on large indexes.

When to reach for it

Add metadata filtering when users' questions are domain-scoped (HR vs finance vs legal), when you need access control (only docs this user can read), when freshness matters (docs from the last 90 days only), or when you have multi-tenant data on a shared index.

Next up in Better RAG: evaluating retrieval quality — measuring whether your top-k results are actually relevant before they reach the LLM.

Sources: Pinecone: The Missing WHERE Clause in Vector Search, ChromaDB: Metadata Filtering docs, YouTube: Smarter RAG Starts with Metadata, Dataquest: Metadata Filtering and Hybrid Search