Build the Prompt

Assemble the system instruction, context chunks, and question into one string

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

From retrieved chunks to a prompt

In the previous chapter you saw the three parts of a RAG prompt: system instruction, context, and question. Now you write a function that assembles them into a single string the model can read.

Why the system instruction matters

The first line of the prompt sets two rules:

"Answer using only the context below" — this is grounding. It tells the model to treat the retrieved chunks as its only source of truth, ignoring anything it learned during training.
"If the answer is not in the context, say 'I don't know'" — this is the safety net against hallucination. Without it, the model may invent plausible-sounding facts when the chunks do not contain the answer.

Both rules work together: grounding limits *where* the model looks, and the fallback instruction limits *what* it says when it finds nothing.

How the template is assembled

The build_prompt function takes a question and a list of context chunks. It produces a single string — a prompt template filled with real data.

Part	Source	Separator
System instruction	Hard-coded in the template	—
Context	Retrieved chunks joined with `"\n\n"`	Double newline between chunks
Question	Passed in by the caller	Placed at the end

The double-newline separator between chunks gives the model a clear visual boundary. Without it, two adjacent chunks could blend together and confuse the model about where one piece of evidence ends and the next begins.

def build_prompt(question, context_chunks):
    context = "\n\n".join(context_chunks)
    prompt = f"You are a helpful assistant. Answer the question using only the context below.\nIf the answer is not in the context, say \"I don't know.\"\n\nContext:\n{context}\n\nQuestion:\n{question}"
    return prompt

Instructions

Complete the build_prompt function. The starter code provides the signature.

Create a variable named context. Assign it "\n\n".join(context_chunks).
Create a variable named prompt. Assign it the following f-string exactly:

f"You are a helpful assistant. Answer the question using only the context below.\nIf the answer is not in the context, say \"I don't know.\"\n\nContext:\n{context}\n\nQuestion:\n{question}"

Return prompt.

← Previous Chapter Next Chapter →

import os
import time
import numpy as np
import pypdf
from dotenv import load_dotenv
from google import genai
from google.genai import types

def extract_text(pdf_path):
    reader = pypdf.PdfReader(pdf_path)
    pages = [page.extract_text() for page in reader.pages]
    return "\n".join(pages)

def chunk_text(text, chunk_size=500, overlap=100):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i : i + chunk_size])
    return chunks

def preview_chunks(chunks):
    print(f"Total chunks: {len(chunks)}")
    print(f"First chunk:\n{chunks[0]}")

def create_client():
    load_dotenv()
    api_key = os.getenv("GEMINI_API_KEY")
    client = genai.Client(api_key=api_key)
    return client

def embed_text(client, text):
    result = client.models.embed_content(model="gemini-embedding-001", contents=text, config=types.EmbedContentConfig(task_type="RETRIEVAL_DOCUMENT"))
    return result.embeddings[0].values

def embed_all_chunks(client, chunks):
    BATCH_SIZE = 90
    embeddings = []
    for i in range(0, len(chunks), BATCH_SIZE):
        batch = chunks[i : i + BATCH_SIZE]
        for chunk in batch:
            embeddings.append(embed_text(client, chunk))
        if i + BATCH_SIZE < len(chunks):
            print("Rate limit pause — waiting 60 seconds...")
            time.sleep(60)
    return embeddings

def cosine_similarity(vec_a, vec_b):
    dot = np.dot(vec_a, vec_b)
    norm = np.linalg.norm(vec_a) * np.linalg.norm(vec_b)
    return dot / norm

def search(client, query, chunks, embeddings, top_k=3):
    result = client.models.embed_content(model="gemini-embedding-001", contents=query, config=types.EmbedContentConfig(task_type="RETRIEVAL_QUERY"))
    query_vector = result.embeddings[0].values
    scores = [(cosine_similarity(query_vector, emb), chunk) for emb, chunk in zip(embeddings, chunks)]
    scores.sort(key=lambda x: x[0], reverse=True)
    return [chunk for _, chunk in scores[:top_k]]

def test_search(client, pdf_path, question):
    text = extract_text(pdf_path)
    chunks = chunk_text(text)
    embeddings = embed_all_chunks(client, chunks)
    results = search(client, question, chunks, embeddings)
    for i, chunk in enumerate(results, 1):
        print(f"Result {i}:\n{chunk}\n")

def build_prompt(question, context_chunks):
    # Step 1: Join context_chunks with double newline
    # Step 2: Build the prompt f-string
    # Step 3: Return prompt

Build the Prompt

From retrieved chunks to a prompt

Why the system instruction matters

How the template is assembled

Instructions

Interactive Code Editor