> ## Documentation Index
> Fetch the complete documentation index at: https://docs.poly.ai/llms.txt
> Use this file to discover all available pages before exploring further.

<AgentInstructions>

## Submitting Feedback

If you encounter incorrect, outdated, or confusing documentation on this page, submit feedback:

POST https://docs.poly.ai/feedback

```json
{
  "path": "/audio-management/introduction",
  "feedback": "Description of the issue"
}
```

Only submit feedback when you have something specific and actionable to report.

</AgentInstructions>

# Audio management

> Optimize agent audio quality and response latency.

Use audio management to reduce voice latency and control how your agent sounds during key moments – greetings, transfer messages, and other frequently spoken phrases. Caching these responses means callers hear them faster and with consistent quality, instead of waiting for real-time TTS generation on every call.

The **Audio Management** tab is found under **Channels > Voice > Audio management**.

<img src="https://mintcdn.com/polyai/Qu880HppNqT19Eyr/images/audio-management/audio-management-1.png?fit=max&auto=format&n=Qu880HppNqT19Eyr&q=85&s=c7e146746b49cc3e4af937af56cd0d6f" alt="audio-management-1" width="2486" height="1218" data-path="images/audio-management/audio-management-1.png" />

## Getting started

To manage your agent's audio:

1. Go to **Channels > Voice > Audio management**.
2. Review all audio saved to the cache and monitor how often it has been used by the agent.
3. You can delete cached files and upload new ones to overwrite existing audio.

<Tabs>
  <Tab title="Edit options">
    You can edit the **stability** and **clarity** of the agent's voice specifically for this utterance. For more information on these settings, visit the [voice](/voice/introduction) feature page. The edit tab also includes sync <Icon icon="arrows-rotate" iconType="solid" /> and play <Icon icon="play" iconType="solid" /> buttons so you can test changes to the utterance live in the edit panel.

    <img src="https://mintcdn.com/polyai/Qu880HppNqT19Eyr/images/audio-management/audio-management-edit-options.png?fit=max&auto=format&n=Qu880HppNqT19Eyr&q=85&s=3cb6189c5961d1b6ada4a1d3f16f0780" alt="audio-management-edit-options" width="1412" height="1192" data-path="images/audio-management/audio-management-edit-options.png" />
  </Tab>

  <Tab title="Add IPA syntax">
    You can add [IPA syntax](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) (International Phonetic Alphabet) to ensure your agent precisely pronounces industry-specific or non-traditionally pronounced terms, names, or domain-specific language.

    <img src="https://mintcdn.com/polyai/Qu880HppNqT19Eyr/images/audio-management/audio-management-audio-complete.png?fit=max&auto=format&n=Qu880HppNqT19Eyr&q=85&s=92fb32768fdbeb1f1fb1bbd2a3c3659a" alt="audio-management-audio-complete" width="1190" height="636" data-path="images/audio-management/audio-management-audio-complete.png" />
  </Tab>
</Tabs>

<Tip>
  **Why am I only seeing a few cached audios?**
  The audio cache stores a file only if the same TTS is generated **at least twice within a 24-hour window**. This helps manage cache size and performance. If a particular utterance isn't used multiple times within that period, it won't persist in the cache and may appear missing.
  To ensure key audios remain cached, consider generating them repeatedly or uploading static versions manually.
</Tip>

## Interaction style

Adjust response latency to balance speed and accuracy.

### Interaction style settings

<img src="https://mintcdn.com/polyai/Qu880HppNqT19Eyr/images/audio-management/interaction-style.png?fit=max&auto=format&n=Qu880HppNqT19Eyr&q=85&s=7528025b945789560d3a9fcc3034bfd6" alt="audio-management-1" width="2102" height="1156" data-path="images/audio-management/interaction-style.png" />

1. Locate the **Interaction style** section on the **Audio management** page (**Channels > Voice > Audio management**).
2. Choose from the available modes:
   * **Turbo Mode** – Fastest response time with high interruption tolerance
   * **Balanced Mode** – Moderate latency with good accuracy
   * **Precise Mode** – Longest latency for maximum accuracy
3. Click on the bubble for your preferred mode. A brief description of the mode will appear.
4. Save your settings to apply changes. Your agent will adjust its behavior immediately.

### Performance characteristics

Each response mode is designed for specific performance needs:

| Mode         | Latency | Interruption tolerance              | Best for                                                                   |
| ------------ | ------- | ----------------------------------- | -------------------------------------------------------------------------- |
| **Turbo**    | 400ms   | High (enable [barge-in](#barge-in)) | Ultra-responsive agents; pair with barge-in to let callers reclaim control |
| **Balanced** | 1600ms  | Moderate                            | General use cases balancing responsiveness and accuracy                    |
| **Precise**  | 2000ms  | Low                                 | Accuracy-critical scenarios with minimal interruptions                     |

## Barge-in

Allow callers to interrupt the agent mid-utterance. When enabled, the agent stops speaking as soon as it detects caller speech, shortening [VAD](https://en.wikipedia.org/wiki/Voice_activity_detection) time and reducing response latency.

To enable this feature, go to **Channels > Voice > Audio Management** and toggle **Enable barge-in**.

### How barge-in works

When barge-in is enabled:

1. The agent begins speaking its response.
2. If the caller starts talking, the agent stops its current utterance and begins listening.
3. The agent processes the caller's input and responds as a new turn.

This also applies to [delay control](/tools/delay-control) responses – if the caller speaks during a filler phrase, the delay sequence is interrupted.

### When to use barge-in

Barge-in works well for:

* FAQ-heavy agents where callers may already know what they need
* Long agent responses where the caller wants to redirect the conversation
* Turbo interaction style, where fast responsiveness is a priority

### When to disable barge-in

Consider disabling barge-in (globally or per flow/step) when:

* The agent is executing a function with **external side effects** (bookings, payments, form submissions) – the caller may interrupt after the action completes but before hearing the confirmation
* The agent must deliver a **mandatory disclosure or disclaimer** that cannot be skipped
* The environment is **noisy**, causing false barge-in triggers from background sounds

### Per-flow and per-step overrides

You can configure barge-in at a granular level using the experimental JSON config. This lets you enable barge-in globally while disabling it for specific flows or steps where interruption would be problematic.

Overrides follow a precedence order: **step > flow > global**. For example, you can have barge-in off globally, enabled for a specific flow, and disabled again for a sensitive step within that flow.

<Note>
  Barge-in behavior cannot be fully tested in the chat panel. Always verify with a real phone call.
</Note>

## Related pages

<CardGroup cols={3}>
  <Card title="Voice configuration" icon="gear" href="/voice/voice-configuration">
    Configure VAD, greeting audio, and call handling settings.
  </Card>

  <Card title="Agent Voice" icon="user" href="/voice/agent">
    Adjust voice stability and clarity for your TTS provider.
  </Card>

  <Card title="Response control" icon="shield-halved" href="/response-control/introduction">
    Block keywords and configure pronunciations.
  </Card>
</CardGroup>
