Skip to main content
Advanced configuration – This page covers complex platform settings. We recommend completing PolyAcademy Level 1 before proceeding.
Use audio management to reduce voice latency and control how your agent sounds during key moments – greetings, transfer messages, and other frequently spoken phrases. Caching these responses means callers hear them faster and with consistent quality, instead of waiting for real-time TTS generation on every call. The Audio Management tab is found under Channels > Voice > Audio management. audio-management-1

Getting started

To manage your agent’s audio:
  1. Go to Channels > Voice > Audio management.
  2. Review all audio saved to the cache and monitor how often it has been used by the agent.
  3. You can delete cached files and upload new ones to overwrite existing audio.
You can edit the stability and clarity of the agent’s voice specifically for this utterance. For more information on these settings, visit the voice feature page. The edit tab also includes sync and play buttons so you can test changes to the utterance live in the edit panel.audio-management-edit-options
Why am I only seeing a few cached audios? The audio cache stores a file only if the same TTS is generated at least twice within a 24-hour window. This helps manage cache size and performance. If a particular utterance isn’t used multiple times within that period, it won’t persist in the cache and may appear missing. To ensure key audios remain cached, consider generating them repeatedly or uploading static versions manually.

Interaction style

Adjust response latency to balance speed and accuracy.

Interaction style settings

audio-management-1
  1. Locate the Interaction style section on the Audio management page (Channels > Voice > Audio management).
  2. Choose from the available modes:
    • Turbo Mode – Fastest response time with high interruption tolerance
    • Balanced Mode – Moderate latency with good accuracy
    • Precise Mode – Longest latency for maximum accuracy
  3. Click on the bubble for your preferred mode. A brief description of the mode will appear.
  4. Save your settings to apply changes. Your agent will adjust its behavior immediately.

Performance characteristics

Each response mode is designed for specific performance needs:
ModeLatencyInterruption toleranceBest for
Turbo400msHigh (enable barge-in)Ultra-responsive agents; pair with barge-in to let callers reclaim control
Balanced1600msModerateGeneral use cases balancing responsiveness and accuracy
Precise2000msLowAccuracy-critical scenarios with minimal interruptions

Barge-in

Allow callers to interrupt the agent mid-utterance. When enabled, the agent stops speaking as soon as it detects caller speech, shortening VAD time and reducing response latency. To enable this feature, go to Channels > Voice > Audio Management and toggle Enable barge-in.

How barge-in works

When barge-in is enabled:
  1. The agent begins speaking its response.
  2. If the caller starts talking, the agent stops its current utterance and begins listening.
  3. The agent processes the caller’s input and responds as a new turn.
This also applies to delay control responses – if the caller speaks during a filler phrase, the delay sequence is interrupted.

When to use barge-in

Barge-in works well for:
  • FAQ-heavy agents where callers may already know what they need
  • Long agent responses where the caller wants to redirect the conversation
  • Turbo interaction style, where fast responsiveness is a priority

When to disable barge-in

Consider disabling barge-in (globally or per flow/step) when:
  • The agent is executing a function with external side effects (bookings, payments, form submissions) – the caller may interrupt after the action completes but before hearing the confirmation
  • The agent must deliver a mandatory disclosure or disclaimer that cannot be skipped
  • The environment is noisy, causing false barge-in triggers from background sounds

Per-flow and per-step overrides

You can configure barge-in at a granular level using the experimental JSON config. This lets you enable barge-in globally while disabling it for specific flows or steps where interruption would be problematic. Overrides follow a precedence order: step > flow > global. For example, you can have barge-in off globally, enabled for a specific flow, and disabled again for a sensitive step within that flow.
Barge-in behavior cannot be fully tested in the chat panel. Always verify with a real phone call.

Relationship with Smart Turn detection

Smart Turn (Smart VAD) is a subset of barge-in. If barge-in is enabled, Smart Turn is automatically active. You can have Smart Turn enabled with barge-in disabled, but the reverse is not recommended.

Smart Turn detection

Smart Turn is an advanced end-of-turn detection feature that automatically enhances conversation quality when using Silero VAD. When Silero VAD is enabled, Smart Turn is automatically activated to provide more accurate detection of when a caller has finished speaking.

How it works

Smart Turn wraps the Silero VAD system with turn-detection logic that analyzes speech patterns to determine when a caller has finished their turn. This reduces instances where the agent responds too early or waits too long. The feature is language-aware and adjusts its behavior based on the detected language of the conversation.
Smart Turn is automatically enabled when you use Silero VAD. No additional configuration is required.

Voice configuration

Configure VAD, greeting audio, and call handling settings.

Agent Voice

Adjust voice stability and clarity for your TTS provider.

Response control

Block keywords and configure pronunciations.
Last modified on April 16, 2026