Clean Up and List Files
Exit

Clean Up and List Files

Remove development helpers and add a directory walker that filters by extension

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

The RAG app you built in the previous course included two helper functions — test_search and preview_chunks — that were useful during development. They let you inspect chunks and verify that search worked. The finished assistant doesn't need them, so this chapter removes them before adding new capabilities.

Walking a directory with os.walk

To index a folder of files, you need a way to list every file inside it — including files in subfolders. Python's os.walk does this:

for dirpath, _, filenames in os.walk(folder):
    for filename in filenames:
        full_path = os.path.join(dirpath, filename)

os.walk yields a tuple for each directory it visits: the directory path, its subdirectories, and its files. The _ discards the subdirectory list — you don't need it because os.walk visits subdirectories automatically.

Filtering by extension

Not every file in a folder contains text the assistant can use. Binary files, images, and compiled artifacts would produce garbage if you tried to read them as text. A SUPPORTED_EXTENSIONS constant defines the exact set of extensions the assistant accepts:

SUPPORTED_EXTENSIONS = {".txt", ".md", ".py", ".js", ".ts", ".yaml", ".yml", ".json"}

Using a set makes the membership check (ext in SUPPORTED_EXTENSIONS) fast regardless of how many extensions you add.

Instructions

  1. Delete the preview_chunks function.
  2. Delete the test_search function.
  3. Define a module-level constant called SUPPORTED_EXTENSIONS. Set it to a set containing these 8 strings: ".txt", ".md", ".py", ".js", ".ts", ".yaml", ".yml", ".json". The indexer checks this constant before reading any file.
  4. Define a function called list_files that takes folder. Inside it, create an empty list called file_paths to hold the matching file paths.
  5. Add the outer loop for dirpath, _, filenames in os.walk(folder):. This walks every subdirectory recursively. Inside that loop, add for filename in filenames: to iterate over each file.
  6. Inside the inner loop:
    • Call os.path.splitext(filename) and assign the result to _, ext to extract the extension.
    • If ext in SUPPORTED_EXTENSIONS, append os.path.join(dirpath, filename) to file_paths. This records the full path of every supported file found.
  7. Return file_paths.

Next Chapter →