Your First Complete RAG Pipeline: Ingestion, Retrieval, and Generation in 50 Lines

Chris Harper

2 min read

Jul 1, 2026 · 12:03 UTC

Tutorial

RAG

Embeddings

TL;DR: Build a working RAG system from scratch — load docs, embed into a vector store, retrieve by semantic similarity, generate with context. Under 50 lines, no framework magic.

What you'll be able to do after this:

Ingest any document into a local vector store and retrieve semantically relevant chunks on any query
Wire retrieval into a generation step to get grounded answers with source citations
Understand each stage of the pipeline so you can tune chunk size, swap models, or move to a hosted vector DB

You've seen embeddings, vector stores, and chunking in isolation. This is where they connect. Every production RAG system — whether it's a customer support bot, a codebase Q&A tool, or a document search API — runs the same five-stage loop:

Load → Split → Embed → Retrieve → Generate

Here's the runnable version using LangChain, ChromaDB, and OpenAI. Set OPENAI_API_KEY, then:

pip install langchain langchain-openai langchain-chroma langchain-community

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# 1. Load — swap WebBaseLoader for PyPDFLoader, CSVLoader, etc.
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

# 2. Split — 1000-char chunks, 200-char overlap
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = splitter.split_documents(docs)

# 3. Embed + Store — ChromaDB persists locally by default
vectorstore = Chroma.from_documents(splits, OpenAIEmbeddings())

# 4. Retrieve — top-k semantic search (default k=4)
retriever = vectorstore.as_retriever()

# 5. Generate — prompt injects retrieved context
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template(
    "Answer based only on the context below.\n\nContext:\n{context}\n\nQuestion: {question}"
)
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt | llm | StrOutputParser()
)

print(rag_chain.invoke("What is an LLM agent?"))

What to try next:

Swap the loader: PyPDFLoader("my_doc.pdf") or DirectoryLoader("./docs") for local files
Add a persist_directory to ChromaDB so the vector store survives between runs
Raise k in as_retriever(search_kwargs={"k": 8}) to widen recall on dense docs
Try a sentence-transformers embedding model instead of OpenAI to run fully local

The video linked below walks through this pattern step by step, explains why each piece exists, and covers advanced query translation strategies once the basic pipeline is working.

Sources: Learn RAG From Scratch — freeCodeCamp | YouTube walkthrough (2.5 hr, Lance Martin / LangChain) | LangChain RAG tutorial

CloudCodeTree

Your First Complete RAG Pipeline: Ingestion, Retrieval, and Generation in 50 Lines