Preview Your Chunks
Exit
Preview Your Chunks
Print the chunk count and inspect the first chunk
💻
Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.
Inspecting your output
Before moving to embeddings, verify that chunking works. A quick preview function prints two things:
- The total number of chunks
- The text of the first chunk
This is a debugging habit worth keeping throughout the pipeline.
Instructions
Complete the preview_chunks function. The starter code provides the signature.
- Print
f"Total chunks: {len(chunks)}". - Print
f"First chunk:\n{chunks[0]}".
import pypdf
def extract_text(pdf_path):
reader = pypdf.PdfReader(pdf_path)
pages = [page.extract_text() for page in reader.pages]
return "\n".join(pages)
def chunk_text(text, chunk_size=500, overlap=100):
chunks = []
for i in range(0, len(text), chunk_size - overlap):
chunks.append(text[i : i + chunk_size])
return chunks
def preview_chunks(chunks):
# Step 1: Print total chunk count
# Step 2: Print first chunk
Interactive Code Editor
Sign in to write and run code, track your progress, and unlock all chapters.
Sign In to Start Coding