CloudCodeTree LogoCloudCodeTree
AI NewsTutorialsAbout
CloudCodeTree Logo
CloudCodeTree
  • AI News
  • Tutorials
  • About
← Back to AI News
Your First Complete RAG Pipeline: Ingestion, Retrieval, and Generation in 50 Lines

Your First Complete RAG Pipeline: Ingestion, Retrieval, and Generation in 50 Lines

Chris Harper

2 min read

Jul 1, 2026 · 12:03 UTC

AI
Tutorial
RAG
Embeddings

TL;DR: Build a working RAG system from scratch — load docs, embed into a vector store, retrieve by semantic similarity, generate with context. Under 50 lines, no framework magic.

What you'll be able to do after this:

  • Ingest any document into a local vector store and retrieve semantically relevant chunks on any query
  • Wire retrieval into a generation step to get grounded answers with source citations
  • Understand each stage of the pipeline so you can tune chunk size, swap models, or move to a hosted vector DB

You've seen embeddings, vector stores, and chunking in isolation. This is where they connect. Every production RAG system — whether it's a customer support bot, a codebase Q&A tool, or a document search API — runs the same five-stage loop:

Load → Split → Embed → Retrieve → Generate

Here's the runnable version using LangChain, ChromaDB, and OpenAI. Set OPENAI_API_KEY, then:

pip install langchain langchain-openai langchain-chroma langchain-community
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# 1. Load — swap WebBaseLoader for PyPDFLoader, CSVLoader, etc.
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

# 2. Split — 1000-char chunks, 200-char overlap
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = splitter.split_documents(docs)

# 3. Embed + Store — ChromaDB persists locally by default
vectorstore = Chroma.from_documents(splits, OpenAIEmbeddings())

# 4. Retrieve — top-k semantic search (default k=4)
retriever = vectorstore.as_retriever()

# 5. Generate — prompt injects retrieved context
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template(
    "Answer based only on the context below.\n\nContext:\n{context}\n\nQuestion: {question}"
)
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt | llm | StrOutputParser()
)

print(rag_chain.invoke("What is an LLM agent?"))

What to try next:

  • Swap the loader: PyPDFLoader("my_doc.pdf") or DirectoryLoader("./docs") for local files
  • Add a persist_directory to ChromaDB so the vector store survives between runs
  • Raise k in as_retriever(search_kwargs={"k": 8}) to widen recall on dense docs
  • Try a sentence-transformers embedding model instead of OpenAI to run fully local

The video linked below walks through this pattern step by step, explains why each piece exists, and covers advanced query translation strategies once the basic pipeline is working.

Sources: Learn RAG From Scratch — freeCodeCamp | YouTube walkthrough (2.5 hr, Lance Martin / LangChain) | LangChain RAG tutorial