Lesson Complete!
Streaming & Commands
Your complete AI CLI assistant
You've built a fully functional RAG assistant that runs entirely from the terminal. Here is what it can do.
Ask a question across all indexed files:
You: what endpoints does the API have?
Assistant: The API exposes the following endpoints...The assistant searches across all indexed files, finds the most relevant chunks, and streams the answer token by token as the model generates it.
Ask about a specific file using @filename:
You: @readme.md what does it cover?
Assistant: The readme covers installation, configuration...The @filename syntax skips the vector search and sends all chunks from that file directly to the model. Use this when you already know which file has the answer.
Ask a follow-up question:
You: what authentication method does it use?
Assistant: Based on our earlier discussion, the API uses...The assistant stores every exchange in history and includes it in each prompt, so follow-up questions work without repeating context.
Use slash commands to control the session:
You: /new
New conversation started. I won't remember what we discussed before.
You: /quit
Goodbye!What you built
Over these four lessons, you assembled each piece of the assistant from scratch:
- File indexing (
files.py): reads any file type, splits content into chunks, attaches source metadata - Embedding and search: converts text to vectors with Gemini, ranks chunks by cosine similarity
- Prompt construction: assembles context, conversation history, and the question into a single prompt
- Streaming output: prints tokens as they arrive instead of waiting for the full response
- Slash commands: lets users list files, reset history, or exit without sending a question to the model
- File routing: detects
@filenamesyntax and bypasses vector search for targeted queries
Point the assistant at any folder of documents — API references, internal wikis, course notes, codebases — and it becomes a searchable, conversational interface to that content.