Collect Text from All Files
Exit

Collect Text from All Files

Add the first part of the folder indexer: list files, read each one, and accumulate results

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

With list_files and read_file in place, you can now write the function that ties them together: index_folder. This function is the entry point for turning a directory on disk into a set of text strings ready for embedding.

The indexer pattern

The pattern has three steps:

  1. List — call list_files(folder) to get every supported file path.
  2. Read — call read_file(path) for each path. If it returns None, skip the file.
  3. Collect — accumulate the successfully-read text strings into a list.

This chapter covers steps 1–3. The next chapter replaces the raw text accumulation with chunking, which splits each file's content into overlapping segments and records which file each segment came from.

Instructions

  1. Define a function called index_folder that takes folder.
  2. Call list_files(folder) and store the result in file_paths.
  3. Create an empty list called texts.
  4. Loop over file_paths. For each path:
    • Call read_file(path) and store the result in text.
    • If not text, add continueread_file returns None for files it cannot read (binary files, encoding errors). Skipping them prevents errors downstream.
    • Append text to texts.
  5. Return texts.