
Your First Complete RAG Pipeline: Ingestion, Retrieval, and Generation in 50 Lines
Chris Harper
2 min read
Jul 1, 2026 · 12:03 UTC
TL;DR: Build a working RAG system from scratch — load docs, embed into a vector store, retrieve by semantic similarity, generate with context. Under 50 lines, no framework magic.
What you'll be able to do after this:
- Ingest any document into a local vector store and retrieve semantically relevant chunks on any query
- Wire retrieval into a generation step to get grounded answers with source citations
- Understand each stage of the pipeline so you can tune chunk size, swap models, or move to a hosted vector DB
You've seen embeddings, vector stores, and chunking in isolation. This is where they connect. Every production RAG system — whether it's a customer support bot, a codebase Q&A tool, or a document search API — runs the same five-stage loop:
Load → Split → Embed → Retrieve → Generate
Here's the runnable version using LangChain, ChromaDB, and OpenAI. Set OPENAI_API_KEY, then:
pip install langchain langchain-openai langchain-chroma langchain-community
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
# 1. Load — swap WebBaseLoader for PyPDFLoader, CSVLoader, etc.
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()
# 2. Split — 1000-char chunks, 200-char overlap
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = splitter.split_documents(docs)
# 3. Embed + Store — ChromaDB persists locally by default
vectorstore = Chroma.from_documents(splits, OpenAIEmbeddings())
# 4. Retrieve — top-k semantic search (default k=4)
retriever = vectorstore.as_retriever()
# 5. Generate — prompt injects retrieved context
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template(
"Answer based only on the context below.\n\nContext:\n{context}\n\nQuestion: {question}"
)
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt | llm | StrOutputParser()
)
print(rag_chain.invoke("What is an LLM agent?"))
What to try next:
- Swap the loader:
PyPDFLoader("my_doc.pdf")orDirectoryLoader("./docs")for local files - Add a
persist_directoryto ChromaDB so the vector store survives between runs - Raise
kinas_retriever(search_kwargs={"k": 8})to widen recall on dense docs - Try a sentence-transformers embedding model instead of OpenAI to run fully local
The video linked below walks through this pattern step by step, explains why each piece exists, and covers advanced query translation strategies once the basic pipeline is working.
Sources: Learn RAG From Scratch — freeCodeCamp | YouTube walkthrough (2.5 hr, Lance Martin / LangChain) | LangChain RAG tutorial