How conversations flow
When a caller connects to your PolyAI agent, the conversation passes through several key stages:1. Telephony layer
The telephony layer handles the phone connection between the caller and your agent. PolyAI supports multiple telephony providers including Twilio, Amazon Connect, and SIP-based systems.2. Speech recognition (ASR)
The caller’s speech is converted to text using automatic speech recognition (ASR). PolyAI uses advanced models optimized for conversational accuracy, with support for:- Multiple languages and accents
- Industry-specific vocabulary
- Real-time transcription
- ASR biasing and keyphrase boosting for domain-specific terms
3. Agent service
The agent service is the core of the system. It receives the transcribed user input and coordinates:- Language understanding (NLU): Interprets what the user said, their intent, and extracts entities
- Decision making (Policy engine): Determines the appropriate response based on your configured Managed Topics, flows, and rules by executing nodes in priority order
- Action execution: Triggers any necessary function calls or API integrations
- Context management: Maintains dialogue context and turn history throughout the conversation
4. Response generation
Based on the decision engine’s output, the system generates an appropriate response using your agent’s configured voice, tone, and knowledge. This may involve:- Retrieving relevant information using RAG (Retrieval-Augmented Generation)
- Applying global rules and response control filters
- Generating contextually appropriate responses via the LLM
5. Text-to-speech (TTS)
The generated response is converted to natural-sounding speech and played back to the caller. PolyAI supports:- Multiple TTS providers and custom voices
- SSML markup for fine-grained control over pronunciation, pauses, and emphasis
- Custom pronunciations using IPA notation
Data storage
During a conversation, PolyAI maintains several types of data:| Data type | Purpose | Retention |
|---|---|---|
| Dialogue context | Tracks the full dialogue history, state variables, and turn data for the current call | Duration of call |
| Turn data | Stores individual exchanges (user input, agent response, intents, entities) for analytics and review | Configurable |
| Conversation metadata | Records conversation-level information (duration, variant, environment) | Configurable |
| Metrics | Records events for reporting and dashboards | Configurable |
Key components you configure
As a builder in Agent Studio, you control how the agent behaves through:- Managed Topics: Information the agent uses to answer questions
- Flows: Structured conversation paths for complex tasks
- Functions: Custom logic and external integrations
- Rules: Global behavior constraints
- Voice settings: How the agent sounds
Processing a single turn
Each turn in a conversation follows this sequence:Retrieve knowledge
Relevant information is fetched from your Managed Topics using RAG (Retrieval-Augmented Generation) via the Ragdoll service.
Generate response
The LLM composes a response based on all available context, applying global rules and response control filters.

