Better RAG: Add Hybrid Search (BM25 + Semantic) to Catch What Vector Search Misses

Chris Harper

3 min read

Jun 24, 2026 · 12:04 UTC

Tutorial

RAG

Embeddings

Best Practices

TL;DR: About 35% of real user queries need exact-match keyword search, not semantic search — hybrid search runs BM25 and vector retrieval in parallel, fuses the ranked lists with Reciprocal Rank Fusion (RRF), and delivers a measurable 7%+ retrieval lift over either method alone.

What you'll be able to do after this:

Understand why vector-only RAG fails on product codes, error strings, and proper nouns — and why BM25 fills that gap
Implement BM25 + semantic search from scratch with rank_bm25 + sentence-transformers and merge the results with RRF
Drop in LangChain's EnsembleRetriever as a two-line shortcut when you're already using LangChain

Why vector-only RAG silently fails

Vector embeddings capture meaning. They're excellent at "what's the timeout setting?" finding a chunk about "connection wait time" even when those words never appear together. They're unreliable on ERR_TLS_CERT_EXPIRED, a part number like SKU-8842-B, or an API endpoint path — because embedding those strings doesn't give the model any semantic intuition about them.

BM25 (Best Match 25) is a term-frequency scoring algorithm — the statistical backbone behind most search engines. It knows nothing about meaning, but it never misses an exact string match.

On the WANDS e-commerce benchmark, a tuned hybrid setup reached 0.7497 NDCG — a 7.4% lift over BM25 alone (0.6983) or vector alone (0.6953). In practice the gap is most visible on the queries with rare identifiers that make up the long tail of any real corpus.

How Reciprocal Rank Fusion (RRF) merges the results

Running both retrievers gives you two independent ranked lists. You can't add BM25 scores to cosine similarity — the units are incompatible. RRF sidesteps this by converting each list to rank-based scores:

RRF_score(doc) = 1 / (k + rank_BM25) + 1 / (k + rank_semantic)

where k = 60 is a stability constant. A document near the top of both lists accumulates the highest combined score. One that only appears in one list still contributes. No normalization, no parameter tuning beyond k.

Code: build it from scratch

pip install rank_bm25 sentence-transformers torch

from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer, util
import torch

docs = [
    "Connection timeout default is 30 seconds",
    "Set ANTHROPIC_API_KEY in your environment",
    "ERR_TLS_CERT_EXPIRED means the server certificate has expired"
]

# BM25 index
bm25 = BM25Okapi([d.lower().split() for d in docs])

# Semantic index
model = SentenceTransformer('all-MiniLM-L6-v2')
doc_embs = model.encode(docs, convert_to_tensor=True)

def hybrid_search(query, top_k=3, k=60):
    # BM25 ranking
    bm25_ranks = sorted(range(len(docs)),
                        key=lambda i: bm25.get_scores(query.lower().split())[i],
                        reverse=True)
    # Semantic ranking
    q_emb = model.encode(query, convert_to_tensor=True)
    sem_ranks = torch.argsort(util.cos_sim(q_emb, doc_embs)[0],
                              descending=True).tolist()
    # RRF fusion
    rrf = {i: 0.0 for i in range(len(docs))}
    for rank, idx in enumerate(bm25_ranks):
        rrf[idx] += 1.0 / (k + rank + 1)
    for rank, idx in enumerate(sem_ranks):
        rrf[idx] += 1.0 / (k + rank + 1)
    return sorted(rrf, key=rrf.get, reverse=True)[:top_k]

print(hybrid_search("ERR_TLS_CERT_EXPIRED"))   # exact match wins
print(hybrid_search("how long does a connection wait"))  # semantic wins

Drop-in shortcut: LangChain EnsembleRetriever

If you're already on LangChain, EnsembleRetriever wires BM25 + FAISS with RRF in a few lines:

from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import FAISS
from langchain.retrievers import EnsembleRetriever

bm25_r = BM25Retriever.from_texts(docs); bm25_r.k = 4
faiss_r = FAISS.from_texts(docs, embedding=your_embeddings).as_retriever(search_kwargs={"k": 4})

hybrid = EnsembleRetriever(retrievers=[bm25_r, faiss_r], weights=[0.4, 0.6])
results = hybrid.invoke("ERR_TLS_CERT_EXPIRED")

weights=[0.4, 0.6] leans semantic — a good starting point. Move toward [0.5, 0.5] if your users frequently search by exact identifiers. Weaviate, Pinecone, and pgvector all offer native hybrid search with the same alpha-parameter tuning when you're ready to move off in-memory retrieval.

Sources: MachineLearningMastery: Implementing Hybrid Semantic-Lexical Search in RAG, Weaviate: Hybrid Search Explained, LangChain EnsembleRetriever tutorial

CloudCodeTree

Better RAG: Add Hybrid Search (BM25 + Semantic) to Catch What Vector Search Misses

Why vector-only RAG silently fails

How Reciprocal Rank Fusion (RRF) merges the results

Code: build it from scratch

Drop-in shortcut: LangChain EnsembleRetriever