
Better RAG: Add Hybrid Search (BM25 + Semantic) to Catch What Vector Search Misses
Chris Harper
3 min read
Jun 24, 2026 · 12:04 UTC
TL;DR: About 35% of real user queries need exact-match keyword search, not semantic search — hybrid search runs BM25 and vector retrieval in parallel, fuses the ranked lists with Reciprocal Rank Fusion (RRF), and delivers a measurable 7%+ retrieval lift over either method alone.
What you'll be able to do after this:
- Understand why vector-only RAG fails on product codes, error strings, and proper nouns — and why BM25 fills that gap
- Implement BM25 + semantic search from scratch with
rank_bm25+sentence-transformersand merge the results with RRF - Drop in LangChain's
EnsembleRetrieveras a two-line shortcut when you're already using LangChain
Why vector-only RAG silently fails
Vector embeddings capture meaning. They're excellent at "what's the timeout setting?" finding a chunk about "connection wait time" even when those words never appear together. They're unreliable on ERR_TLS_CERT_EXPIRED, a part number like SKU-8842-B, or an API endpoint path — because embedding those strings doesn't give the model any semantic intuition about them.
BM25 (Best Match 25) is a term-frequency scoring algorithm — the statistical backbone behind most search engines. It knows nothing about meaning, but it never misses an exact string match.
On the WANDS e-commerce benchmark, a tuned hybrid setup reached 0.7497 NDCG — a 7.4% lift over BM25 alone (0.6983) or vector alone (0.6953). In practice the gap is most visible on the queries with rare identifiers that make up the long tail of any real corpus.
How Reciprocal Rank Fusion (RRF) merges the results
Running both retrievers gives you two independent ranked lists. You can't add BM25 scores to cosine similarity — the units are incompatible. RRF sidesteps this by converting each list to rank-based scores:
RRF_score(doc) = 1 / (k + rank_BM25) + 1 / (k + rank_semantic)
where k = 60 is a stability constant. A document near the top of both lists accumulates the highest combined score. One that only appears in one list still contributes. No normalization, no parameter tuning beyond k.
Code: build it from scratch
pip install rank_bm25 sentence-transformers torch
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer, util
import torch
docs = [
"Connection timeout default is 30 seconds",
"Set ANTHROPIC_API_KEY in your environment",
"ERR_TLS_CERT_EXPIRED means the server certificate has expired"
]
# BM25 index
bm25 = BM25Okapi([d.lower().split() for d in docs])
# Semantic index
model = SentenceTransformer('all-MiniLM-L6-v2')
doc_embs = model.encode(docs, convert_to_tensor=True)
def hybrid_search(query, top_k=3, k=60):
# BM25 ranking
bm25_ranks = sorted(range(len(docs)),
key=lambda i: bm25.get_scores(query.lower().split())[i],
reverse=True)
# Semantic ranking
q_emb = model.encode(query, convert_to_tensor=True)
sem_ranks = torch.argsort(util.cos_sim(q_emb, doc_embs)[0],
descending=True).tolist()
# RRF fusion
rrf = {i: 0.0 for i in range(len(docs))}
for rank, idx in enumerate(bm25_ranks):
rrf[idx] += 1.0 / (k + rank + 1)
for rank, idx in enumerate(sem_ranks):
rrf[idx] += 1.0 / (k + rank + 1)
return sorted(rrf, key=rrf.get, reverse=True)[:top_k]
print(hybrid_search("ERR_TLS_CERT_EXPIRED")) # exact match wins
print(hybrid_search("how long does a connection wait")) # semantic wins
Drop-in shortcut: LangChain EnsembleRetriever
If you're already on LangChain, EnsembleRetriever wires BM25 + FAISS with RRF in a few lines:
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import FAISS
from langchain.retrievers import EnsembleRetriever
bm25_r = BM25Retriever.from_texts(docs); bm25_r.k = 4
faiss_r = FAISS.from_texts(docs, embedding=your_embeddings).as_retriever(search_kwargs={"k": 4})
hybrid = EnsembleRetriever(retrievers=[bm25_r, faiss_r], weights=[0.4, 0.6])
results = hybrid.invoke("ERR_TLS_CERT_EXPIRED")
weights=[0.4, 0.6] leans semantic — a good starting point. Move toward [0.5, 0.5] if your users frequently search by exact identifiers. Weaviate, Pinecone, and pgvector all offer native hybrid search with the same alpha-parameter tuning when you're ready to move off in-memory retrieval.
Sources: MachineLearningMastery: Implementing Hybrid Semantic-Lexical Search in RAG, Weaviate: Hybrid Search Explained, LangChain EnsembleRetriever tutorial