Split Text into Fixed-Size Chunks with Overlap
Exit

Split Text into Fixed-Size Chunks with Overlap

Write the chunking function that divides text into overlapping pieces

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

How the chunking loop works

To split text with overlap, use a for loop where the stride is smaller than the chunk size.

If chunk_size = 500 and overlap = 100, the stride is 500 - 100 = 400. The loop starts at 0, then 400, then 800, and so on.

At each position i, slice text[i : i + chunk_size] to get a chunk.

for i in range(0, len(text), chunk_size - overlap):
    chunk = text[i : i + chunk_size]
    chunks.append(chunk)

Instructions

Complete the chunk_text function. The starter code provides the signature.

  1. Create an empty list named chunks.
  2. Create a for loop with a variable named i. Start at 0, end at len(text), and increment by chunk_size - overlap.
  3. Inside the loop, append text[i : i + chunk_size] to chunks.
  4. After the loop, return chunks.