What Is RAG and Why It Works
Exit

What Is RAG and Why It Works

Understand the retrieve-then-generate pattern before writing any code

The problem with LLMs out of the box

Large language models like Gemini are trained on a snapshot of the internet. They know a lot, but they don't know your documents. Ask Gemini about your invoice from last Tuesday, and it will guess — or refuse.

This is called hallucination: the model generates plausible-sounding text that isn't grounded in facts.

What RAG does

RAG stands for Retrieval-Augmented Generation. Instead of asking the model to remember your data, you:

  1. Find the relevant passages from your document.
  2. Hand them to the model as context.
  3. Ask the model to answer using only that context.

The model stops guessing and starts reasoning over real information.

The pipeline you will build

Your PDF
  ↓  extract text
Raw text
  ↓  split into chunks
List of chunks
  ↓  embed each chunk (Gemini API)
List of vectors
  ↓  embed your question, find closest vectors
Top matching chunks
  ↓  build prompt: context + question
Prompt
  ↓  generate answer (Gemini API)
Answer

Each lesson builds one stage of this pipeline. By Lesson 6, you will wire them into a single CLI app.

What you need

  • Python 3.9 or later
  • A free Google AI Studio account — create one and generate an API key before Lesson 3

Next Chapter →