Why Chunking Matters

Understand token limits and the role of overlap before writing code

Why you can't send a whole PDF to the model

Gemini and every other LLM has a context window — a maximum number of tokens it can process at once. A 100-page PDF may contain 80,000 tokens. Even if the model could accept it, sending the entire document for every question is slow and expensive.

More importantly, the model performs better when the context is focused. A 500-token passage about payment terms is more useful than 80,000 tokens of noise when you ask "What is the total amount due?"

What chunking is

Chunking splits your document into small, overlapping pieces of text. Each chunk is a candidate for retrieval.

A typical chunk is 500 characters long. That's large enough to hold a complete idea but small enough to keep retrieval results focused. Chunks overlap by 100 characters — about 1-2 sentences. This overlap ensures that sentences split across a boundary still appear whole in at least one chunk.

|--- chunk 1 (500 chars) ---|
                        |--- chunk 2 (500 chars) ---|
                                                |--- chunk 3 ---|
        ↑ overlap (100 chars)        ↑ overlap (100 chars)

The two parameters you will use

ParameterTypical valueEffect
chunk_size500Maximum characters per chunk
overlap100Characters shared between adjacent chunks

You will write the chunking function in Chapter 3.

Next Chapter →