One-Shot vs. Interactive
Understand why a persistent loop is better than restarting the program for each question
The cost of restarting
Right now the assistant runs once: it indexes your folder (or loads from cache) and exits. To ask a second question, you restart the program. Even with the cache, that means loading the JSON file, re-creating the client, and re-embedding your query every single time.
An interactive assistant does that setup once and then waits for questions. The startup cost is paid exactly once per session.
The REPL pattern
A REPL (Read-Eval-Print Loop) reads input, processes it, prints output, and repeats. The Python pattern is:
while True:
question = input("You: ").strip()
if not question:
continue
# search, prompt, generate, printwhile True— the loop runs until the program is explicitly stopped (Ctrl+C or a quit command you add later).input("You: ")— displaysYou:as a prompt and blocks until the user presses Enter..strip()— removes leading and trailing whitespace so accidental spaces don't cause unexpected behavior.if not question: continue— skips to the next iteration when the user presses Enter without typing anything, preventing a search with an empty query.
What the next chapters do
Before wiring the loop, you will update build_prompt to handle the dict chunks that index_folder returns. Then you will add the loop function, connect the pipeline inside it, and update main to call it.