Build the Streaming Function
Exit

Build the Streaming Function

Replace the blocking generate_answer with stream_answer that prints tokens as they arrive

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

Printing tokens without buffering

The stream_answer function receives chunks from the API as the model generates them. Two arguments to print ensure each chunk appears immediately:

  • end="" — replaces the default newline so each chunk runs onto the same line as the previous one.
  • flush=True — forces Python to write the chunk to the terminal immediately. Without it, Python's output buffer may hold several chunks before displaying them, which defeats the purpose of streaming.

After the loop finishes, a bare print() moves the cursor to the next line so the next prompt appears cleanly below the response.

The function also accumulates each chunk into full_text and returns it. History and prompt-building still need the complete response string, not individual chunks.

Instructions

  1. Define a function called stream_answer that takes client and prompt as arguments.
  2. Inside stream_answer, create a variable called full_text and assign it an empty string "".
  3. Write a for loop: for chunk in client.models.generate_content_stream(model="gemini-2.5-flash", contents=prompt):. Use chunk as the loop variable.
  4. Inside the loop, add if chunk.text: to skip empty chunks. Inside that block:
    • Add print(chunk.text, end="", flush=True)end="" keeps the cursor on the same line so tokens run together, and flush=True forces the output to appear immediately rather than waiting in a buffer.
    • Add full_text += chunk.text to accumulate the complete response text.
  5. After the loop (back at the function's indentation level), call print() with no arguments — this moves the cursor to the next line after the stream finishes.
  6. Return full_text.