Parse Command-Line Arguments
Exit

Parse Command-Line Arguments

Read the PDF path and question from the terminal using sys.argv

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

From functions to a working app

You now have every function the RAG pipeline needs: extract, chunk, embed, search, prompt, generate, print, and cache. But they are loose pieces — nothing calls them in order. The next three chapters wire them together into a main() function you can run from the terminal.

After these chapters, you will run:

python app.py invoice.pdf "What is the total amount due?"

and get back an answer grounded in the PDF's content. This chapter handles the first step: reading the user's input.

Reading arguments from the command line

Python's sys.argv is a list of strings that holds whatever the user typed after python:

IndexValue
sys.argv[0]"app.py" (the script name)
sys.argv[1]"invoice.pdf" (the PDF path)
sys.argv[2]"What is the total amount due?" (the question)

The main() function reads sys.argv[1] and sys.argv[2] to get the PDF path and the question.

The if __name__ guard

The line if __name__ == "__main__" tells Python: *run main() only when this file is executed directly*. If another script imports a function from this file (like from app import search), the guard prevents the full pipeline from running automatically.

The cache path convention

Each PDF gets its own cache file. The path is the PDF path with .cache.json appended:

invoice.pdf → invoice.pdf.cache.json
report.pdf  → report.pdf.cache.json

This chapter sets up cache_path as a variable. The next chapter uses it to decide whether to embed or load from disk.

Instructions

Start the main function. The starter code provides the signature and the if __name__ guard.

  1. Create a variable named pdf_path. Assign it sys.argv[1].
  2. Create a variable named question. Assign it sys.argv[2].
  3. Create a variable named cache_path. Assign it pdf_path + ".cache.json".
  4. Create a variable named client. Assign it create_client().