Write the Search Function

Score every chunk against a query and return the top matches

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

Putting the pieces together

You can now extract text, split it into chunks, embed those chunks, and score two vectors with cosine similarity. The search function ties all of this together: given a user's question, find the chunks that are most likely to contain the answer.

Why the query needs its own embedding

When you embedded chunks in the previous lesson, you used task_type="RETRIEVAL_DOCUMENT". For the user's question you use a different task type: "RETRIEVAL_QUERY".

Why two types? The embedding model is trained to place a short question near the documents that answer it — even though the question and the answer use different words. Setting the task type tells the model which role the text plays, so it can optimise the vector accordingly.

The search algorithm

The function takes five arguments:

Argument	Purpose
`client`	The Gemini API client
`query`	The user's question as a string
`chunks`	The list of text chunks from the PDF
`embeddings`	The list of embedding vectors (one per chunk, same order)
`top_k`	How many results to return (default 3)

The steps inside the function:

Embed the query to get a query vector.
Score every chunk by computing cosine similarity between the query vector and the chunk's embedding.
Sort by score, highest first.
Return the top-k chunks.

Python patterns in this function

The solution uses two patterns you will see often in Python:

zip(embeddings, chunks) — walks two lists in lockstep, pairing the first embedding with the first chunk, the second with the second, and so on.
List comprehension — builds a new list in a single expression. [(score, chunk) for emb, chunk in zip(...)] creates a list of (score, chunk) pairs.

def search(client, query, chunks, embeddings, top_k=3):
    result = client.models.embed_content(
        model="gemini-embedding-001",
        contents=query,
        config=types.EmbedContentConfig(task_type="RETRIEVAL_QUERY")
    )
    query_vector = result.embeddings[0].values
    scores = [(cosine_similarity(query_vector, emb), chunk)
              for emb, chunk in zip(embeddings, chunks)]
    scores.sort(key=lambda x: x[0], reverse=True)
    return [chunk for _, chunk in scores[:top_k]]

Instructions

Complete the search function. The starter code provides the signature.

Create a variable named result. Assign it client.models.embed_content(model="gemini-embedding-001", contents=query, config=types.EmbedContentConfig(task_type="RETRIEVAL_QUERY")).
Create a variable named query_vector. Assign it result.embeddings[0].values.
Create a variable named scores. Assign it a list comprehension that produces (cosine_similarity(query_vector, emb), chunk) for each emb, chunk in zip(embeddings, chunks).
Sort scores by calling scores.sort(key=lambda x: x[0], reverse=True).
Return a list comprehension that extracts chunk from each _, chunk in scores[:top_k].

← Previous Chapter Next Chapter →

import os
import time
import numpy as np
import pypdf
from dotenv import load_dotenv
from google import genai
from google.genai import types

def extract_text(pdf_path):
    reader = pypdf.PdfReader(pdf_path)
    pages = [page.extract_text() for page in reader.pages]
    return "\n".join(pages)

def chunk_text(text, chunk_size=500, overlap=100):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i : i + chunk_size])
    return chunks

def preview_chunks(chunks):
    print(f"Total chunks: {len(chunks)}")
    print(f"First chunk:\n{chunks[0]}")

def create_client():
    load_dotenv()
    api_key = os.getenv("GEMINI_API_KEY")
    client = genai.Client(api_key=api_key)
    return client

def embed_text(client, text):
    result = client.models.embed_content(model="gemini-embedding-001", contents=text, config=types.EmbedContentConfig(task_type="RETRIEVAL_DOCUMENT"))
    return result.embeddings[0].values

def embed_all_chunks(client, chunks):
    BATCH_SIZE = 90
    embeddings = []
    for i in range(0, len(chunks), BATCH_SIZE):
        batch = chunks[i : i + BATCH_SIZE]
        for chunk in batch:
            embeddings.append(embed_text(client, chunk))
        if i + BATCH_SIZE < len(chunks):
            print("Rate limit pause — waiting 60 seconds...")
            time.sleep(60)
    return embeddings

def cosine_similarity(vec_a, vec_b):
    dot = np.dot(vec_a, vec_b)
    norm = np.linalg.norm(vec_a) * np.linalg.norm(vec_b)
    return dot / norm

def search(client, query, chunks, embeddings, top_k=3):
    # Step 1: Embed the query with task_type="RETRIEVAL_QUERY"
    # Step 2: Extract query_vector from result
    # Step 3: Score every chunk with cosine_similarity
    # Step 4: Sort scores descending
    # Step 5: Return top_k chunks