Why Chunking Matters
Understand token limits and the role of overlap before writing code
Why you can't send a whole PDF to the model
Gemini and every other LLM has a context window — a maximum number of tokens it can process at once. A 100-page PDF may contain 80,000 tokens. Even if the model could accept it, sending the entire document for every question is slow and expensive.
More importantly, the model performs better when the context is focused. A 500-token passage about payment terms is more useful than 80,000 tokens of noise when you ask "What is the total amount due?"
What chunking is
Chunking splits your document into small, overlapping pieces of text. Each chunk is a candidate for retrieval.
A typical chunk is 500 characters long. That's large enough to hold a complete idea but small enough to keep retrieval results focused. Chunks overlap by 100 characters — about 1-2 sentences. This overlap ensures that sentences split across a boundary still appear whole in at least one chunk.
|--- chunk 1 (500 chars) ---|
|--- chunk 2 (500 chars) ---|
|--- chunk 3 ---|
↑ overlap (100 chars) ↑ overlap (100 chars)The two parameters you will use
| Parameter | Typical value | Effect |
|---|---|---|
chunk_size | 500 | Maximum characters per chunk |
overlap | 100 | Characters shared between adjacent chunks |
You will write the chunking function in Chapter 3.