> ## Documentation Index
> Fetch the complete documentation index at: https://docs.poly.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Conversation flow

> How PolyAI agents process conversations.

How a PolyAI agent processes conversations from input to response.

<img src="https://mintcdn.com/polyai/ZVzKq3G72RK1ScpE/images/operations/diagram-full-static.png?fit=max&auto=format&n=ZVzKq3G72RK1ScpE&q=85&s=2f35945308641690531bc4c85de23b22" alt="PolyAI Voice Agent Conversation Flow" width="4920" height="3458" data-path="images/operations/diagram-full-static.png" />

<Warning>The agent's greeting is sent directly to the user without LLM processing or behavioral rules. For voice, the greeting text is converted to speech; for webchat, it's displayed as text. Make sure to write the greeting in the language your users expect. Behavioral rules and agent logic only apply starting from the second turn of the conversation.</Warning>

## Processing stages

A conversation moves through the following stages:

<AccordionGroup>
  <Accordion title="1. Input and processing">
    * **User**: The user provides input–speech (voice) or text (webchat/SMS).
    * **Input capture**: For voice, the audio stream is captured and sent for transcription. For webchat/SMS, text is received directly.
    * **ASR Provider** (voice only): The system receives the raw audio.
    * **[ASR Service](/speech-recognition/introduction)** (voice only): Converts the audio into text using [automatic speech recognition](https://en.wikipedia.org/wiki/Speech_recognition).
    * **ASR Processing** (voice only): Searches for transcription issues and applies any relevant corrections.
    * **Transcript/Text → Processed Input**: The processed input is passed to [Retrieval](/managed-topics/RAG/introduction).
    * **Retrieval**: Pulls relevant **topics retrieved** from the [Knowledge area](/managed-topics/introduction) using [RAG (retrieval-augmented generation)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) to provide context for the response.
  </Accordion>

  <Accordion title="2. Compute prompt and generate response">
    * **Compute Prompt**: The system builds an [LLM](https://en.wikipedia.org/wiki/Large_language_model) prompt using retrieved topics, system knowledge, and conversation history.
    * **Run LLM**: The LLM processes the request and determines whether to return:
      * **Returned Text**: A direct text response.
      * **Returned Function**: A tool call (if applicable).
    * **Execute Function (if applicable)**: Runs the function and passes the result back to the LLM.
    * **LLM Refinement**: If a function result is returned, the LLM updates its response before proceeding.
  </Accordion>

  <Accordion title="3. Streaming and chunking">
    * **Chunk LLM Output**: The response is broken into chunks for delivery.
    * **Postprocess Chunks**: Applies rules such as [stop keywords](/response-control/stop-keywords) to remove unnecessary phrases.
    * **Stream Partial Responses**: The system sends chunks as soon as they are ready, rather than waiting for the full response.
    * **TTS Service** (voice only): Converts text chunks into speech using [text-to-speech synthesis](https://en.wikipedia.org/wiki/Speech_synthesis). Configure voices in [voice settings](/voice/introduction).
    * **Response delivery**: For voice, synthesized speech is streamed to the user. For webchat/SMS, text responses are sent directly.
  </Accordion>

  <Accordion title="4. Post-processing and handoff">
    * **Live Handoff (if applicable)**: If escalation is needed, the agent triggers a [live handoff](/call-handoff/introduction). For voice, this transfers the call; for webchat, this can route to a live chat agent.
    * **Conversation Logs**: The system stores conversation history and logs for [analytics](/analytics/conversations/introduction).
    * **Final Response**: The user receives the completed response as it streams, without waiting for the entire message.
  </Accordion>
</AccordionGroup>

<div className="full-only">
  ## Advanced: How response streaming works

  PolyAI agents don't wait for the full response before speaking. Instead, responses are processed and streamed **in real time**:

  * **LLM Streaming**: Words are generated and sent continuously.
  * **Chunking**: Responses are broken into chunks for controlled delivery.
  * **Postprocessing**: [Stop keywords](/response-control/stop-keywords) remove unnecessary phrases before delivery.
  * **Response Streaming**: For voice, users hear speech as soon as it's processed via TTS. For webchat, text appears progressively as it's generated.
</div>

### Watch it in action

This video visualizes the conversation flow, showing how responses are processed, chunked, and streamed:

<div style={{ display: "flex", justifyContent: "center", margin: "20px 0" }}>
  <video controls width="800" style={{ maxWidth: "100%", borderRadius: "8px" }}>
    <source src="https://res.cloudinary.com/dtdd8khwd/video/upload/v1741790751/xcs5zrejbxnckibtm39f.mov" type="video/mp4" />

    Your browser does not support the video tag.
  </video>
</div>

## Next steps

<CardGroup cols={2}>
  <Card title="Architecture overview" icon="diagram-project" href="/glossary/architecture">
    Understand system components and data flow
  </Card>

  <Card title="Agent settings" icon="sliders" href="/agent-settings/introduction">
    Configure your agent's personality and behavior
  </Card>

  <Card title="Knowledge setup" icon="brain" href="/managed-topics/introduction">
    Add managed topics and knowledge sources
  </Card>

  <Card title="Speech recognition" icon="ear" href="/speech-recognition/introduction">
    Tune ASR and input processing
  </Card>
</CardGroup>
