Tell the Model What It Knows
Inject the list of indexed files into the prompt so the model can answer meta questions
Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.
A question the model can't answer yet
Some questions are about the content of your files. Others are about the index itself — "What files do you have?" or "How many documents did you index?" These are meta questions: questions about what the assistant knows, not what it contains.
Try asking your assistant: "What files do you have?" Right now it answers "I don't know" — because the file list isn't in the retrieved chunks. The model only sees what search finds, and meta questions about the index return nothing useful.
The fix is to inject the file list as a system-level line at the top of every prompt. The model then has that information available for every question, not just the ones that happen to retrieve a relevant chunk.
Why a separate parameter
Adding file_list=None as an optional parameter keeps build_prompt reusable. If no file list is passed, the function behaves exactly as before. The if file_list: guard means None or an empty list produces no extra line.
The cost is small: a short comma-separated list of filenames adds a few tokens per call, but it makes the assistant dramatically more useful for users who want to know what they can ask about.
Instructions
- Add
file_list=Nonetobuild_prompt's signature aftercontext_chunks. - Inside
build_prompt, on the line aftercontext = ..., addfiles_line = "". - On the next line, add
if file_list:— this checks whether any files were indexed. If the list contains items, setfiles_line = f"You have access to these files: {', '.join(file_list)}\n". - In
chat_loop, before thewhile True:line, addfile_list = sorted(set(chunk["source"] for chunk in chunks)). - Update the
build_promptcall inside the loop to passfile_listas the third argument. - Back in
build_prompt, replacereturn promptwith areturnthat concatenates the string in multi-line format:
return (
f"{files_line}"
"You are a helpful assistant. Answer the question using only the context below.\n"
"If the answer is not in the context, say \"I don't know.\"\n\n"
f"Context:\n{context}\n\n"
f"Question:\n{question}"
)
import json
import os
import sys
import time
import numpy as np
from dotenv import load_dotenv
from google import genai
from google.genai import types
from files import index_folder
def create_client():
load_dotenv()
api_key = os.getenv("GEMINI_API_KEY")
client = genai.Client(api_key=api_key)
return client
def embed_text(client, text):
result = client.models.embed_content(model="gemini-embedding-001", contents=text, config=types.EmbedContentConfig(task_type="RETRIEVAL_DOCUMENT"))
return result.embeddings[0].values
def embed_all_chunks(client, texts):
BATCH_SIZE = 90
embeddings = []
for i in range(0, len(texts), BATCH_SIZE):
batch = texts[i : i + BATCH_SIZE]
for text in batch:
embeddings.append(embed_text(client, text))
if i + BATCH_SIZE < len(texts):
print("Rate limit pause — waiting 60 seconds...")
time.sleep(60)
return embeddings
def cosine_similarity(vec_a, vec_b):
dot = np.dot(vec_a, vec_b)
norm = np.linalg.norm(vec_a) * np.linalg.norm(vec_b)
return dot / norm
def search(client, query, chunks, embeddings, top_k=3):
result = client.models.embed_content(model="gemini-embedding-001", contents=query, config=types.EmbedContentConfig(task_type="RETRIEVAL_QUERY"))
query_vector = result.embeddings[0].values
scores = [(cosine_similarity(query_vector, emb), chunk) for emb, chunk in zip(embeddings, chunks)]
scores.sort(key=lambda x: x[0], reverse=True)
return [chunk for _, chunk in scores[:top_k]]
def build_prompt(question, context_chunks):
context = "\n\n".join(chunk["text"] for chunk in context_chunks)
# Step 1: add file_list=None parameter to the signature above
# Step 2:
# Step 3:
prompt = f"You are a helpful assistant. Answer the question using only the context below.\nIf the answer is not in the context, say \"I don't know.\"\n\nContext:\n{context}\n\nQuestion:\n{question}"
# Step 6: replace the return below with the multi-line format
return prompt
def generate_answer(client, prompt):
response = client.models.generate_content(model="gemini-2.5-flash", contents=prompt)
return response.text
def save_embeddings(chunks, embeddings, cache_path):
data = {"chunks": chunks, "embeddings": embeddings}
with open(cache_path, "w") as f:
json.dump(data, f)
def load_embeddings(cache_path):
if not os.path.exists(cache_path):
return None
with open(cache_path) as f:
data = json.load(f)
return data["chunks"], data["embeddings"]
def chat_loop(client, chunks, embeddings):
# Step 4:
print("Assistant ready. Type your question, or /help for commands.\n")
while True:
question = input("You: ").strip()
if not question:
continue
top_chunks = search(client, question, chunks, embeddings)
# Step 5: update the call below to pass file_list
prompt = build_prompt(question, top_chunks)
answer = generate_answer(client, prompt)
print(f"Assistant: {answer}")
def main():
if len(sys.argv) < 2:
print("Usage: python app.py <folder>")
sys.exit(1)
folder = sys.argv[1]
cache_path = folder.rstrip("/\\") + ".cache.json"
client = create_client()
cached = load_embeddings(cache_path)
if cached:
chunks, embeddings = cached
print(f"Loaded cache from {cache_path}")
else:
print(f"Indexing {folder}...")
chunks = index_folder(folder)
texts = [chunk["text"] for chunk in chunks]
file_count = len(set(chunk["source"] for chunk in chunks))
print(f"Indexed {len(chunks)} chunks from {file_count} files.")
embeddings = embed_all_chunks(client, texts)
save_embeddings(chunks, embeddings, cache_path)
print(f"Cache saved to {cache_path}")
if __name__ == "__main__":
main()
Interactive Code Editor
Sign in to write and run code, track your progress, and unlock all chapters.
Sign In to Start Coding