Tell the Model What It Knows

Inject the list of indexed files into the prompt so the model can answer meta questions

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

A question the model can't answer yet

Some questions are about the content of your files. Others are about the index itself — "What files do you have?" or "How many documents did you index?" These are meta questions: questions about what the assistant knows, not what it contains.

Try asking your assistant: "What files do you have?" Right now it answers "I don't know" — because the file list isn't in the retrieved chunks. The model only sees what search finds, and meta questions about the index return nothing useful.

The fix is to inject the file list as a system-level line at the top of every prompt. The model then has that information available for every question, not just the ones that happen to retrieve a relevant chunk.

Why a separate parameter

Adding file_list=None as an optional parameter keeps build_prompt reusable. If no file list is passed, the function behaves exactly as before. The if file_list: guard means None or an empty list produces no extra line.

The cost is small: a short comma-separated list of filenames adds a few tokens per call, but it makes the assistant dramatically more useful for users who want to know what they can ask about.

Instructions

Add file_list=None to build_prompt's signature after context_chunks.
Inside build_prompt, on the line after context = ..., add files_line = "".
On the next line, add if file_list: — this checks whether any files were indexed. If the list contains items, set files_line = f"You have access to these files: {', '.join(file_list)}\n".
In chat_loop, before the while True: line, add file_list = sorted(set(chunk["source"] for chunk in chunks)).
Update the build_prompt call inside the loop to pass file_list as the third argument.
Back in build_prompt, replace return prompt with a return that concatenates the string in multi-line format:

return (
       f"{files_line}"
       "You are a helpful assistant. Answer the question using only the context below.\n"
       "If the answer is not in the context, say \"I don't know.\"\n\n"
       f"Context:\n{context}\n\n"
       f"Question:\n{question}"
   )

← Previous Chapter Next Chapter →

import json
import os
import sys
import time
import numpy as np
from dotenv import load_dotenv
from google import genai
from google.genai import types
from files import index_folder

def create_client():
    load_dotenv()
    api_key = os.getenv("GEMINI_API_KEY")
    client = genai.Client(api_key=api_key)
    return client

def embed_text(client, text):
    result = client.models.embed_content(model="gemini-embedding-001", contents=text, config=types.EmbedContentConfig(task_type="RETRIEVAL_DOCUMENT"))
    return result.embeddings[0].values

def embed_all_chunks(client, texts):
    BATCH_SIZE = 90
    embeddings = []
    for i in range(0, len(texts), BATCH_SIZE):
        batch = texts[i : i + BATCH_SIZE]
        for text in batch:
            embeddings.append(embed_text(client, text))
        if i + BATCH_SIZE < len(texts):
            print("Rate limit pause — waiting 60 seconds...")
            time.sleep(60)
    return embeddings

def cosine_similarity(vec_a, vec_b):
    dot = np.dot(vec_a, vec_b)
    norm = np.linalg.norm(vec_a) * np.linalg.norm(vec_b)
    return dot / norm

def search(client, query, chunks, embeddings, top_k=3):
    result = client.models.embed_content(model="gemini-embedding-001", contents=query, config=types.EmbedContentConfig(task_type="RETRIEVAL_QUERY"))
    query_vector = result.embeddings[0].values
    scores = [(cosine_similarity(query_vector, emb), chunk) for emb, chunk in zip(embeddings, chunks)]
    scores.sort(key=lambda x: x[0], reverse=True)
    return [chunk for _, chunk in scores[:top_k]]

def build_prompt(question, context_chunks):
    context = "\n\n".join(chunk["text"] for chunk in context_chunks)
    # Step 1: add file_list=None parameter to the signature above
    # Step 2:
    # Step 3:
    prompt = f"You are a helpful assistant. Answer the question using only the context below.\nIf the answer is not in the context, say \"I don't know.\"\n\nContext:\n{context}\n\nQuestion:\n{question}"
    # Step 6: replace the return below with the multi-line format
    return prompt

def generate_answer(client, prompt):
    response = client.models.generate_content(model="gemini-2.5-flash", contents=prompt)
    return response.text

def save_embeddings(chunks, embeddings, cache_path):
    data = {"chunks": chunks, "embeddings": embeddings}
    with open(cache_path, "w") as f:
        json.dump(data, f)

def load_embeddings(cache_path):
    if not os.path.exists(cache_path):
        return None
    with open(cache_path) as f:
        data = json.load(f)
    return data["chunks"], data["embeddings"]

def chat_loop(client, chunks, embeddings):
    # Step 4:
    print("Assistant ready. Type your question, or /help for commands.\n")
    while True:
        question = input("You: ").strip()
        if not question:
            continue
        top_chunks = search(client, question, chunks, embeddings)
        # Step 5: update the call below to pass file_list
        prompt = build_prompt(question, top_chunks)
        answer = generate_answer(client, prompt)
        print(f"Assistant: {answer}")

def main():
    if len(sys.argv) < 2:
        print("Usage: python app.py <folder>")
        sys.exit(1)
    folder = sys.argv[1]
    cache_path = folder.rstrip("/\\") + ".cache.json"
    client = create_client()
    cached = load_embeddings(cache_path)
    if cached:
        chunks, embeddings = cached
        print(f"Loaded cache from {cache_path}")
    else:
        print(f"Indexing {folder}...")
        chunks = index_folder(folder)
        texts = [chunk["text"] for chunk in chunks]
        file_count = len(set(chunk["source"] for chunk in chunks))
        print(f"Indexed {len(chunks)} chunks from {file_count} files.")
        embeddings = embed_all_chunks(client, texts)
        save_embeddings(chunks, embeddings, cache_path)
        print(f"Cache saved to {cache_path}")

if __name__ == "__main__":
    main()

Tell the Model What It Knows

A question the model can't answer yet

Why a separate parameter

Instructions

Interactive Code Editor