
What Are Embeddings? From Zero to Semantic Search in 10 Lines of Python
Chris Harper
2 min read
Jun 23, 2026 · 18:15 UTC
TL;DR: Embeddings convert text to vectors where meaning is distance — two lines of Python with sentence-transformers produce numbers you can use for search, RAG, and AI memory.
What you'll be able to do after this:
- Generate semantic embeddings from any text with three lines of Python using a pretrained model
- Find the most similar document to a query using cosine similarity — the core of every RAG pipeline
- Understand the vector space model that underpins RAG, semantic search, and AI memory
Before you can build RAG, before a vector database makes sense, before you can evaluate retrieval quality — you need to understand what an embedding is and how to generate one.
What an embedding is. The text "The weather is lovely today" becomes a list of 384 floating-point numbers. Two sentences with similar meaning end up as vectors geometrically close in that 384-dimensional space — even with no shared words. "It's so sunny outside!" is a neighbor; "She drove to the stadium." is far away.
The 10-line walkthrough. Using the sentence-transformers library:
pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("all-MiniLM-L6-v2")
sentences = [
"The weather is lovely today",
"It's so sunny outside!",
"She drove to the stadium."
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (3, 384)
similarities = util.cos_sim(embeddings, embeddings)
# sentences 0 and 1 score ~0.7+; sentence 2 scores ~0.2 against both
That similarity computation is literally how RAG retrieval works: embed your query, embed your documents, return the ones with highest cosine similarity.
Run it now. HuggingFace's "Getting Started With Embeddings" blog walks through a real-world semantic search over a Medicare FAQ dataset — with a Colab notebook you can open and run immediately. Start with all-MiniLM-L6-v2 (fast, light, general-purpose); move to all-mpnet-base-v2 for better quality on more demanding tasks. Over 10,000 pretrained models are on the sentence-transformers HuggingFace page — many optimized for code, multilingual text, or long documents.
Where this leads. Once you can generate embeddings and compute similarity, every subsequent RAG concept clicks: chunking is about what unit you embed, pgvector/Chroma/FAISS store the resulting vectors, retrieval is the cosine search you just ran, and reranking is a second pass on the top-k results. This 10-line example is the foundation.
Sources: Sentence Transformers official docs (sbert.net), HuggingFace: Getting Started With Embeddings, Colab notebook, sentence-transformers on HuggingFace Hub