Lesson Complete!

Extract and Chunk Text

What you did in this lesson

  • Learned why token limits make chunking necessary
  • Wrote extract_text() — pulls all page text into one string
  • Wrote chunk_text() — splits that string into overlapping pieces
  • Wrote preview_chunks() — inspects the output before moving on

What comes next

You have a list of text chunks. In Lesson 3, you will convert each chunk into a vector — a list of numbers that captures its meaning. This is what makes semantic search possible.