This page explains how a PolyAI agent processes a conversation, from caller input to response generation.

Expand the image to zoom.

Processing stages

A conversation moves through the following stages:

Advanced: How response streaming works

PolyAI agents don’t wait for the full response before speaking. Instead, responses are processed and streamed in real time:

  • LLM Streaming: Words are generated and sent continuously.
  • Chunking: Before reaching TTS, responses are broken into chunks for controlled delivery.
  • Postprocessing: Stop keywords remove unnecessary phrases before they are spoken.
  • TTS Streaming: The caller hears speech as soon as it’s processed, rather than waiting for the entire response.

Watch it in action

This video visualizes the conversation flow, showing how responses are processed, chunked, and streamed: