
Benefits
- Latency reduction: Serve cached audio to minimize TTS latency.
- Improved audio quality: Dependable, reliable high-quality TTS generation using the cache.
- Consistency: Cached audio plays identically each time, avoiding TTS variation.
Getting started
To manage your agent’s audio:- Go to the Audio Management tab.
- Review all audio saved to the cache and monitor how often it has been used by the agent.
- You can delete cached files and upload new ones to overwrite existing audio.
- Edit options
- Add IPA syntax
You can edit the stability and clarity of the agent’s voice specifically for this utterance. For more information on these settings, visit the voice feature page. The edit tab also includes sync and play buttons so you can test changes to the utterance live in the edit panel.

Interaction style
Adjust response latency to balance speed and accuracy.Interaction style settings

- Locate the Interaction style section on the Audio management page (Channels > Voice > Audio management).
- Choose from the available modes:
- Swift Mode
- Balanced Mode
- Precise Mode
- Turbo Mode
- Click on the bubble for your preferred mode. A brief description of the mode will appear.
- Save your settings to apply changes. Your agent will adjust its behavior immediately.
Performance characteristics
Each response mode is designed for specific performance needs:| Mode | Latency | Interruption tolerance | Best for |
|---|---|---|---|
| Turbo | 400ms | High (enable barge-in) | Ultra-responsive agents; pair with barge-in to let callers reclaim control |
| Swift | 1200ms | Moderate-high | Quick interactions where speed outweighs precision |
| Balanced | 1600ms | Moderate | General use cases balancing responsiveness and accuracy |
| Precise | 2000ms | Low | Accuracy-critical scenarios with minimal interruptions |
Barge-in
Allow callers to interrupt the agent. This shortens VAD time and reduces response latency. Experiment with barge-in and latency modes to find the right balance. To test this feature, find “Enable barge-in” in the “Settings” menu.Smart Turn detection
Smart Turn is an advanced end-of-turn detection feature that automatically enhances conversation quality when using Silero VAD. When Silero VAD is enabled, Smart Turn is automatically activated to provide more accurate detection of when a caller has finished speaking.Benefits
- Improved accuracy: Better detection of natural conversation endpoints
- Language-aware: Adapts to different languages for optimal performance
- Automatic activation: No manual configuration required when using Silero VAD
- Reduced false interruptions: Minimizes premature agent responses
How it works
Smart Turn wraps the Silero VAD system with intelligent turn-detection logic that analyzes speech patterns to determine when a caller has genuinely finished their turn. This reduces instances where the agent responds too early or waits too long, creating a more natural conversational flow. The feature is language-aware and automatically adjusts its behavior based on the detected language of the conversation, ensuring optimal performance across multilingual deployments.Smart Turn is automatically enabled when you use Silero VAD. No additional configuration is required.


