What Is RAG and Why It Works
Exit
What Is RAG and Why It Works
Understand the retrieve-then-generate pattern before writing any code
The problem with LLMs out of the box
Large language models like Gemini are trained on a snapshot of the internet. They know a lot, but they don't know your documents. Ask Gemini about your invoice from last Tuesday, and it will guess — or refuse.
This is called hallucination: the model generates plausible-sounding text that isn't grounded in facts.
What RAG does
RAG stands for Retrieval-Augmented Generation. Instead of asking the model to remember your data, you:
- Find the relevant passages from your document.
- Hand them to the model as context.
- Ask the model to answer using only that context.
The model stops guessing and starts reasoning over real information.
The pipeline you will build
Your PDF
↓ extract text
Raw text
↓ split into chunks
List of chunks
↓ embed each chunk (Gemini API)
List of vectors
↓ embed your question, find closest vectors
Top matching chunks
↓ build prompt: context + question
Prompt
↓ generate answer (Gemini API)
AnswerEach lesson builds one stage of this pipeline. By Lesson 6, you will wire them into a single CLI app.
What you need
- Python 3.9 or later
- A free Google AI Studio account — create one and generate an API key before Lesson 3