How RAG Assembles the Final Prompt
Exit

How RAG Assembles the Final Prompt

Understand the prompt template before writing the generation code

The prompt is the bridge

Retrieval gives you relevant text. Generation turns that text into an answer. The prompt is what connects them.

A RAG prompt has three parts:

  1. System instruction — tells the model its role and constraints
  2. Context — the retrieved chunks, pasted verbatim
  3. Question — what the user asked

A concrete example

You are a helpful assistant. Answer the question using only the context below.
If the answer is not in the context, say "I don't know."

Context:
The total amount due is $450.00, payable by March 31, 2025.
Payment can be made by bank transfer or credit card.

Question:
What is the total amount due?

The model reads the context and answers: "The total amount due is $450.00."

It does not guess. It does not hallucinate. It reads.

What you will write

  • build_prompt(question, context_chunks) — assembles the string above
  • generate_answer(prompt) — sends it to Gemini and returns the response text
  • print_result(answer, source_chunks) — displays the answer and its sources
  • save_embeddings(chunks, embeddings, cache_path) — caches vectors to disk
  • load_embeddings(cache_path) — loads cached vectors
  • main() — wires the full pipeline from CLI arguments

Next Chapter →