Build the Prompt
Assemble the system instruction, context chunks, and question into one string
Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.
From retrieved chunks to a prompt
In the previous chapter you saw the three parts of a RAG prompt: system instruction, context, and question. Now you write a function that assembles them into a single string the model can read.
Why the system instruction matters
The first line of the prompt sets two rules:
- "Answer using only the context below" — this is grounding. It tells the model to treat the retrieved chunks as its only source of truth, ignoring anything it learned during training.
- "If the answer is not in the context, say 'I don't know'" — this is the safety net against hallucination. Without it, the model may invent plausible-sounding facts when the chunks do not contain the answer.
Both rules work together: grounding limits *where* the model looks, and the fallback instruction limits *what* it says when it finds nothing.
How the template is assembled
The build_prompt function takes a question and a list of context chunks. It produces a single string — a prompt template filled with real data.
| Part | Source | Separator |
|---|---|---|
| System instruction | Hard-coded in the template | — |
| Context | Retrieved chunks joined with "\n\n" | Double newline between chunks |
| Question | Passed in by the caller | Placed at the end |
The double-newline separator between chunks gives the model a clear visual boundary. Without it, two adjacent chunks could blend together and confuse the model about where one piece of evidence ends and the next begins.
def build_prompt(question, context_chunks):
context = "\n\n".join(context_chunks)
prompt = f"You are a helpful assistant. Answer the question using only the context below.\nIf the answer is not in the context, say \"I don't know.\"\n\nContext:\n{context}\n\nQuestion:\n{question}"
return prompt
Instructions
Complete the build_prompt function. The starter code provides the signature.
- Create a variable named
context. Assign it"\n\n".join(context_chunks). - Create a variable named
prompt. Assign it the following f-string exactly:
f"You are a helpful assistant. Answer the question using only the context below.\nIf the answer is not in the context, say \"I don't know.\"\n\nContext:\n{context}\n\nQuestion:\n{question}"
- Return
prompt.
import os
import time
import numpy as np
import pypdf
from dotenv import load_dotenv
from google import genai
from google.genai import types
def extract_text(pdf_path):
reader = pypdf.PdfReader(pdf_path)
pages = [page.extract_text() for page in reader.pages]
return "\n".join(pages)
def chunk_text(text, chunk_size=500, overlap=100):
chunks = []
for i in range(0, len(text), chunk_size - overlap):
chunks.append(text[i : i + chunk_size])
return chunks
def preview_chunks(chunks):
print(f"Total chunks: {len(chunks)}")
print(f"First chunk:\n{chunks[0]}")
def create_client():
load_dotenv()
api_key = os.getenv("GEMINI_API_KEY")
client = genai.Client(api_key=api_key)
return client
def embed_text(client, text):
result = client.models.embed_content(model="gemini-embedding-001", contents=text, config=types.EmbedContentConfig(task_type="RETRIEVAL_DOCUMENT"))
return result.embeddings[0].values
def embed_all_chunks(client, chunks):
BATCH_SIZE = 90
embeddings = []
for i in range(0, len(chunks), BATCH_SIZE):
batch = chunks[i : i + BATCH_SIZE]
for chunk in batch:
embeddings.append(embed_text(client, chunk))
if i + BATCH_SIZE < len(chunks):
print("Rate limit pause — waiting 60 seconds...")
time.sleep(60)
return embeddings
def cosine_similarity(vec_a, vec_b):
dot = np.dot(vec_a, vec_b)
norm = np.linalg.norm(vec_a) * np.linalg.norm(vec_b)
return dot / norm
def search(client, query, chunks, embeddings, top_k=3):
result = client.models.embed_content(model="gemini-embedding-001", contents=query, config=types.EmbedContentConfig(task_type="RETRIEVAL_QUERY"))
query_vector = result.embeddings[0].values
scores = [(cosine_similarity(query_vector, emb), chunk) for emb, chunk in zip(embeddings, chunks)]
scores.sort(key=lambda x: x[0], reverse=True)
return [chunk for _, chunk in scores[:top_k]]
def test_search(client, pdf_path, question):
text = extract_text(pdf_path)
chunks = chunk_text(text)
embeddings = embed_all_chunks(client, chunks)
results = search(client, question, chunks, embeddings)
for i, chunk in enumerate(results, 1):
print(f"Result {i}:\n{chunk}\n")
def build_prompt(question, context_chunks):
# Step 1: Join context_chunks with double newline
# Step 2: Build the prompt f-string
# Step 3: Return prompt
Interactive Code Editor
Sign in to write and run code, track your progress, and unlock all chapters.
Sign In to Start Coding