How RAG Assembles the Final Prompt - Generate Answers and Finish the App | Build a RAG App — Chat with Your PDFs

The prompt is the bridge

Retrieval gives you relevant text. Generation turns that text into an answer. The prompt is what connects them.

A RAG prompt has three parts:

System instruction — tells the model its role and constraints
Context — the retrieved chunks, pasted verbatim
Question — what the user asked

A concrete example

You are a helpful assistant. Answer the question using only the context below.
If the answer is not in the context, say "I don't know."

Context:
The total amount due is $450.00, payable by March 31, 2025.
Payment can be made by bank transfer or credit card.

Question:
What is the total amount due?

The model reads the context and answers: "The total amount due is $450.00."

It does not guess. It does not hallucinate. It reads.

What you will write

build_prompt(question, context_chunks) — assembles the string above
generate_answer(prompt) — sends it to Gemini and returns the response text
print_result(answer, source_chunks) — displays the answer and its sources
save_embeddings(chunks, embeddings, cache_path) — caches vectors to disk
load_embeddings(cache_path) — loads cached vectors
main() — wires the full pipeline from CLI arguments