Skip to main content
Use the Audio Management tab to commonly-used TTS audio elements like greetings or transfer messages. Caching reduces latency and ensures consistent voice quality. audio-management-1

Benefits

  • Latency reduction: Serve cached audio to minimize TTS latency.
  • Improved audio quality: Dependable, reliable high-quality TTS generation using the cache.
  • Consistency: Cached audio plays identically each time, avoiding TTS variation.

Getting started

To manage your agent’s audio:
  1. Go to the Audio Management tab.
  2. Review all audio saved to the cache and monitor how often it has been used by the agent.
  3. You can delete cached files and upload new ones to overwrite existing audio.
You can edit the stability and clarity of the agent’s voice specifically for this utterance. For more information on these settings, visit the voice feature page. The edit tab also includes sync and play buttons so you can test changes to the utterance live in the edit panel.audio-management-edit-options
Why am I only seeing a few cached audios? The audio cache stores a file only if the same TTS is generated at least twice within a 24-hour window. This helps manage cache size and performance. If a particular utterance isn’t used multiple times within that period, it won’t persist in the cache and may appear missing. To ensure key audios remain cached, consider generating them repeatedly or uploading static versions manually.

Interaction style

Adjust response latency to balance speed and accuracy.

Interaction style settings

audio-management-1
  1. Locate the Interaction style section on the Audio management page (Channels > Voice > Audio management).
  2. Choose from the available modes:
    • Swift Mode
    • Balanced Mode
    • Precise Mode
    • Turbo Mode
  3. Click on the bubble for your preferred mode. A brief description of the mode will appear.
  4. Save your settings to apply changes. Your agent will adjust its behavior immediately.

Performance characteristics

Each response mode is designed for specific performance needs:
ModeLatencyInterruption toleranceBest for
Turbo400msHigh (enable barge-in)Ultra-responsive agents; pair with barge-in to let callers reclaim control
Swift1200msModerate-highQuick interactions where speed outweighs precision
Balanced1600msModerateGeneral use cases balancing responsiveness and accuracy
Precise2000msLowAccuracy-critical scenarios with minimal interruptions

Barge-in

Allow callers to interrupt the agent. This shortens VAD time and reduces response latency. Experiment with barge-in and latency modes to find the right balance. To test this feature, find “Enable barge-in” in the “Settings” menu.

Smart Turn detection

Smart Turn is an advanced end-of-turn detection feature that automatically enhances conversation quality when using Silero VAD. When Silero VAD is enabled, Smart Turn is automatically activated to provide more accurate detection of when a caller has finished speaking.

Benefits

  • Improved accuracy: Better detection of natural conversation endpoints
  • Language-aware: Adapts to different languages for optimal performance
  • Automatic activation: No manual configuration required when using Silero VAD
  • Reduced false interruptions: Minimizes premature agent responses

How it works

Smart Turn wraps the Silero VAD system with intelligent turn-detection logic that analyzes speech patterns to determine when a caller has genuinely finished their turn. This reduces instances where the agent responds too early or waits too long, creating a more natural conversational flow. The feature is language-aware and automatically adjusts its behavior based on the detected language of the conversation, ensuring optimal performance across multilingual deployments.
Smart Turn is automatically enabled when you use Silero VAD. No additional configuration is required.
Last modified on March 20, 2026