What Are Embeddings? From Zero to Semantic Search in 10 Lines of Python

Chris Harper

2 min read

Jun 23, 2026 · 18:15 UTC

Tutorial

Embeddings

RAG

TL;DR: Embeddings convert text to vectors where meaning is distance — two lines of Python with sentence-transformers produce numbers you can use for search, RAG, and AI memory.

What you'll be able to do after this:

Generate semantic embeddings from any text with three lines of Python using a pretrained model
Find the most similar document to a query using cosine similarity — the core of every RAG pipeline
Understand the vector space model that underpins RAG, semantic search, and AI memory

Before you can build RAG, before a vector database makes sense, before you can evaluate retrieval quality — you need to understand what an embedding is and how to generate one.

What an embedding is. The text "The weather is lovely today" becomes a list of 384 floating-point numbers. Two sentences with similar meaning end up as vectors geometrically close in that 384-dimensional space — even with no shared words. "It's so sunny outside!" is a neighbor; "She drove to the stadium." is far away.

The 10-line walkthrough. Using the sentence-transformers library:

pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "The weather is lovely today",
    "It's so sunny outside!",
    "She drove to the stadium."
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (3, 384)

similarities = util.cos_sim(embeddings, embeddings)
# sentences 0 and 1 score ~0.7+; sentence 2 scores ~0.2 against both

That similarity computation is literally how RAG retrieval works: embed your query, embed your documents, return the ones with highest cosine similarity.

Run it now. HuggingFace's "Getting Started With Embeddings" blog walks through a real-world semantic search over a Medicare FAQ dataset — with a Colab notebook you can open and run immediately. Start with all-MiniLM-L6-v2 (fast, light, general-purpose); move to all-mpnet-base-v2 for better quality on more demanding tasks. Over 10,000 pretrained models are on the sentence-transformers HuggingFace page — many optimized for code, multilingual text, or long documents.

Where this leads. Once you can generate embeddings and compute similarity, every subsequent RAG concept clicks: chunking is about what unit you embed, pgvector/Chroma/FAISS store the resulting vectors, retrieval is the cosine search you just ran, and reranking is a second pass on the top-k results. This 10-line example is the foundation.

Sources: Sentence Transformers official docs (sbert.net), HuggingFace: Getting Started With Embeddings, Colab notebook, sentence-transformers on HuggingFace Hub

CloudCodeTree

What Are Embeddings? From Zero to Semantic Search in 10 Lines of Python