
From Embeddings to Search: Your First Vector Database with ChromaDB
Chris Harper
3 min read
Jun 23, 2026 · 21:05 UTC
TL;DR: ChromaDB turns your sentence-transformer embeddings into a searchable local database in 4 lines of Python — no server, no cloud account, and one argument change makes it persistent.
What you'll be able to do after this:
- Create a local vector database, add documents with auto-generated embeddings, and retrieve the most semantically similar results by meaning — in under 20 lines of Python
- Understand how ChromaDB bridges the gap from "I can generate embeddings" (yesterday's post) to "I can search by meaning at scale"
- Build the core retrieve-then-generate loop that every RAG application shares
Why you need a vector database
In the previous tutorial you saw that sentence-transformers converts text to 384-dimensional vectors where similarity = meaning. A vector database stores those vectors and runs the similarity search fast — across thousands or millions of documents, in milliseconds. ChromaDB is the fastest path to running one locally.
Install and run
pip install chromadb
import chromadb
client = chromadb.Client()
collection = client.create_collection("my_docs")
collection.add(
documents=[
"Claude Code runs agentic tasks from the terminal",
"ChromaDB stores embeddings and runs them locally",
"sentence-transformers converts text to vectors",
"RAG retrieves relevant context before generating"
],
ids=["doc1", "doc2", "doc3", "doc4"]
)
results = collection.query(
query_texts=["how do I generate embeddings?"],
n_results=2
)
print(results["documents"])
# [['sentence-transformers converts text to vectors',
# 'ChromaDB stores embeddings and runs them locally']]
You didn't call an embedding model yourself. Chroma ships with all-MiniLM-L6-v2 as its default embedding function and runs it at both add and query time. Swap in your own model — or the official integrations for OpenAI, HuggingFace, or Cohere — via the embedding_function parameter when you need more control.
Persist across restarts
client = chromadb.PersistentClient(path="/tmp/my_chroma")
One argument. Now your entire knowledge base survives restarts and can be loaded by a web server or a CI job.
Metadata and filtered search
Add a metadata dict to each document at index time:
collection.add(
documents=chunks,
metadatas=[{"source": "auth-docs.md", "section": "oauth"}, ...],
ids=[...]
)
Query with a filter:
results = collection.query(
query_texts=["how does token refresh work?"],
where={"section": "oauth"},
n_results=5
)
Only documents matching the filter are searched — critical for multi-tenant apps or large knowledge bases with distinct domains.
The full RAG loop
- Index — chunk your documents (see today's practitioner post on chunking strategies), call
collection.add()once - Retrieve — call
collection.query(query_texts=[user_question], n_results=5)at runtime - Generate — pass the returned
documentslist as context in the Claude messages API
That loop — index once, retrieve at runtime, generate with context — is the pattern every RAG application is built on. ChromaDB handles step 2 completely, and the metadata field is what lets you show users [Source: auth-docs.md] citations in the answer.
Sources: ChromaDB official docs, ChromaDB getting started, DataCamp: Learn How to Use Chroma DB