Switch to Streaming in the Loop
Exit

Switch to Streaming in the Loop

Delete generate_answer and replace its call in chat_loop with stream_answer

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

Wiring up the streaming function

chat_loop still calls generate_answer, but stream_answer does the same job — and streams tokens as they arrive instead of waiting for the full response. generate_answer is no longer needed.

The replacement also changes how the response prints. Before, generate_answer returned the full text and chat_loop printed it all at once with print(f"Assistant: {answer}"). With streaming, you print the "Assistant: " prefix first — without a newline, so tokens appear right after it — then call stream_answer, which prints each token as it arrives. The old print statement is no longer needed.

Instructions

  1. Delete the entire generate_answer function — stream_answer replaces it.
  2. In chat_loop, replace answer = generate_answer(client, prompt) with two lines:
    • print("Assistant: ", end="", flush=True)
    • answer = stream_answer(client, prompt)
  3. Remove the print(f"Assistant: {answer}") line — streaming already printed the response as it arrived.