From Embeddings to Search: Your First Vector Database with ChromaDB

Chris Harper

3 min read

Jun 23, 2026 · 21:05 UTC

Tutorial

Vectors

RAG

TL;DR: ChromaDB turns your sentence-transformer embeddings into a searchable local database in 4 lines of Python — no server, no cloud account, and one argument change makes it persistent.

What you'll be able to do after this:

Create a local vector database, add documents with auto-generated embeddings, and retrieve the most semantically similar results by meaning — in under 20 lines of Python
Understand how ChromaDB bridges the gap from "I can generate embeddings" (yesterday's post) to "I can search by meaning at scale"
Build the core retrieve-then-generate loop that every RAG application shares

Why you need a vector database

In the previous tutorial you saw that sentence-transformers converts text to 384-dimensional vectors where similarity = meaning. A vector database stores those vectors and runs the similarity search fast — across thousands or millions of documents, in milliseconds. ChromaDB is the fastest path to running one locally.

Install and run

pip install chromadb

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_docs")

collection.add(
    documents=[
        "Claude Code runs agentic tasks from the terminal",
        "ChromaDB stores embeddings and runs them locally",
        "sentence-transformers converts text to vectors",
        "RAG retrieves relevant context before generating"
    ],
    ids=["doc1", "doc2", "doc3", "doc4"]
)

results = collection.query(
    query_texts=["how do I generate embeddings?"],
    n_results=2
)
print(results["documents"])
# [['sentence-transformers converts text to vectors',
#   'ChromaDB stores embeddings and runs them locally']]

You didn't call an embedding model yourself. Chroma ships with all-MiniLM-L6-v2 as its default embedding function and runs it at both add and query time. Swap in your own model — or the official integrations for OpenAI, HuggingFace, or Cohere — via the embedding_function parameter when you need more control.

Persist across restarts

client = chromadb.PersistentClient(path="/tmp/my_chroma")

One argument. Now your entire knowledge base survives restarts and can be loaded by a web server or a CI job.

Metadata and filtered search

Add a metadata dict to each document at index time:

collection.add(
    documents=chunks,
    metadatas=[{"source": "auth-docs.md", "section": "oauth"}, ...],
    ids=[...]
)

Query with a filter:

results = collection.query(
    query_texts=["how does token refresh work?"],
    where={"section": "oauth"},
    n_results=5
)

Only documents matching the filter are searched — critical for multi-tenant apps or large knowledge bases with distinct domains.

The full RAG loop

Index — chunk your documents (see today's practitioner post on chunking strategies), call collection.add() once
Retrieve — call collection.query(query_texts=[user_question], n_results=5) at runtime
Generate — pass the returned documents list as context in the Claude messages API

That loop — index once, retrieve at runtime, generate with context — is the pattern every RAG application is built on. ChromaDB handles step 2 completely, and the metadata field is what lets you show users [Source: auth-docs.md] citations in the answer.