Voice and audio updates

Your agent’s voice is a critical part of the customer experience. This page covers how to update voice settings, manage audio quality, and troubleshoot common audio issues.

Quick reference

I need to…	Action	Time estimate
Change the agent’s voice	Channels > Voice > Agent Voice → Change	5 min
Adjust voice parameters	Channels > Voice > Agent Voice → Settings gear	2 min
Fix a mispronunciation	Channels > Voice > Response Control → Pronunciations	5 min
Update cached audio	Audio Management → Edit → Sync	3 min
Change interaction style (latency)	Settings menu → Interaction style	2 min
Enable/disable barge-in	Settings menu → Enable barge-in toggle	1 min
Upload custom audio file	Audio Management → Upload	5 min

Changing your agent’s voice

When to change voices

Consider updating your agent’s voice when:

Brand refresh - Your company updates its brand identity
Customer feedback - Users report the voice is unclear or unpleasant
Language changes - You’re expanding to new languages or regions
Quality improvements - Newer voice models become available
A/B testing - You want to test different voices for effectiveness

How to change the voice

Go to Channels > Voice > Agent Voice
Click Change next to the Assistant or Disclaimer section
The Voice Library opens with Explore and Favorites tabs
Filter voices by Language, Region, and Gender
Preview voices using the play button or enter custom text
Click Select to apply the voice
Test in Agent Chat before publishing

For non-English projects, use a multilingual_v2 model to ensure proper language support.

Voice configuration in functions

If you’re using voice settings in functions, you can configure provider-specific parameters: ElevenLabs:

from polyai.voice import ElevenLabsVoice

conv.set_voice(
    ElevenLabsVoice(
        provider_voice_id="a1b2C3d4E5f6G7h8I9j0",
        stability=0.5,        # 0.0-1.0: consistency of tone
        similarity_boost=0.7  # 0.0-1.0: match to original voice
    )
)

Cartesia:

from polyai.voice import CartesiaVoice, Emotion, EmotionKind, EmotionIntensity

conv.set_voice(
    CartesiaVoice(
        provider_voice_id="cartesia_ai_voice",
        speed=0.0,  # -1.0 (slowest) to 1.0 (fastest)
        emotions=[
            Emotion(EmotionKind.POSITIVITY, EmotionIntensity.HIGH)
        ],
        model_id="sonic"
    )
)

Rime:

from polyai.voice import RimeVoice

conv.set_voice(
    RimeVoice(
        provider_voice_id="s3://path/to/manifest.json",
        speech_alpha=1.0  # <1.0 faster, >1.0 slower
    )
)

See Add a voice for more provider examples.

Adjusting speech speed and timing

Global speech rate

To adjust how fast your agent speaks:

Go to Channels > Voice > Agent Voice
Click the Settings gear icon next to the voice
Adjust the Stability and Clarity and Similarity sliders
Click Done to save, then test with sample phrases

Recommended ranges:

0.9-1.0 - Slightly slower, clearer for complex information
1.0 - Normal conversational speed
1.0-1.2 - Faster, more energetic

Interaction style (response latency)

Control how quickly your agent responds after the user stops speaking:

Locate the Interaction style section in Channels > Voice > Audio management
Choose a mode:
- Turbo (400ms) - Ultra-fast, may interrupt more
- Swift (1200ms) - Fast responses, good for simple queries
- Balanced (1600ms) - Default, works for most use cases
- Precise (2000ms) - Slower but more accurate

Turbo mode increases interruptions. Enable barge-in to let callers interrupt the agent naturally.

Barge-in settings

Barge-in allows callers to interrupt the agent mid-sentence:

Find Enable barge-in in Channels > Voice > Audio management
Toggle it on or off
Test in Agent Chat to ensure it feels natural

When to enable:

Using Turbo or Swift interaction modes
Callers frequently try to interrupt
You want more natural, human-like conversations

When to disable:

Agent needs to deliver complete information (e.g., legal disclaimers)
Background noise causes false interruptions
Callers prefer to listen fully before responding

Managing audio quality

Audio Management tab

The Audio Management tab lets you cache and optimize frequently-used audio:

Go to Channels > Voice > Audio management
Review cached audio files
Edit, delete, or upload new versions

Benefits:

Reduced latency - Cached audio plays instantly
Consistent quality - Same audio every time
Better pronunciation - Fine-tune specific phrases

Editing cached audio

To update a cached audio file:

Find the utterance in Audio Management
Click Edit
Adjust stability and clarity settings
Add IPA syntax for pronunciation corrections
Click the sync icon to regenerate
Click play to preview
Save when satisfied

Audio is only cached if the same TTS is generated at least twice within 24 hours. For critical phrases (greetings, transfers), generate them repeatedly or upload manually.

Uploading custom audio

For maximum control, upload pre-recorded audio:

Go to Audio Management
Click Upload
Select your audio file (WAV or MP3)
Associate it with a specific utterance or trigger
Test in Agent Chat

Use cases for custom audio:

Brand-specific greetings
Legal disclaimers requiring exact wording
Music or sound effects
Celebrity or executive voices

Fixing pronunciation issues

Using Pronunciations

When the agent mispronounces words, use the Pronunciations feature:

Go to Channels > Voice > Response Control and open the Pronunciations tab
Click Add pronunciation
Enter the word or phrase as it appears in text
Provide the IPA (International Phonetic Alphabet) pronunciation
Test in Agent Chat

Example:

Text: “PolyAI”
IPA: /ˈpɒli eɪ aɪ/

Using SSML for advanced control

You can also use SSML (Speech Synthesis Markup Language) in pronunciations:

<break time="500ms"/>  <!-- Pause for 500 milliseconds -->
<prosody rate="slow">Speak this slowly</prosody>
<emphasis level="strong">Emphasize this</emphasis>

Common pronunciation fixes

Issue	Solution
Name mispronounced	Add IPA pronunciation
Numbers spoken too fast	Add `<break>` tags between digits
Acronyms spelled out	Add pronunciation as word (e.g., “NASA” → “nassa”)
Domain jargon unclear	Add IPA or SSML emphasis

Pronunciations affect how things are spoken. If the issue is what is said, use Response Controls instead.

Troubleshooting audio issues

Common issues and solutions

Issue	Likely cause	Solution
Voice sounds robotic	Low-quality TTS provider	Switch to ElevenLabs or upgrade voice model
Agent speaks too fast	Rate set too high	Adjust voice settings via the gear icon in Channels > Voice > Agent Voice
Agent interrupts frequently	Turbo mode without barge-in	Enable barge-in or switch to Balanced mode
Mispronunciations	TTS doesn’t recognize word	Add pronunciation in Response Control
Inconsistent audio quality	Not using cached audio	Enable Audio Management for key phrases
High latency	Slow TTS provider or network	Switch to Cartesia or use cached audio
Audio cuts out	Network issues or buffer problems	Check interaction style settings
Background noise causes interruptions	Barge-in too sensitive	Disable barge-in or adjust VAD settings

Testing audio changes

After making audio updates:

Test in Agent Chat - Try various scenarios
Listen for quality - Is the voice clear and natural?
Check timing - Are pauses and speed appropriate?
Verify pronunciations - Are custom words spoken correctly?
Test interruptions - Does barge-in work as expected?
Review in Sandbox - Make test calls before promoting

Best practices

Voice selection

Match your brand - Choose a voice that reflects your company’s personality
Consider your audience - Age, region, and preferences matter
Test with real users - Get feedback before going live
Use multilingual voices - For non-English projects, ensure proper language support

Audio optimization

Cache common phrases - Greetings, transfers, and FAQs should be cached
Use custom audio sparingly - Only for critical brand moments
Monitor latency - Balance quality with response speed
Test across devices - Audio quality varies on different phones

Maintenance routine

Monthly voice review - Listen to recent calls and identify issues
Update pronunciations - Add new terms as your business evolves
Refresh cached audio - When voice settings change, regenerate cache
A/B test changes - Use variants to test voice updates before full rollout

Common workflows

Updating voice for a new brand

Select a new voice from the Voice Library at Channels > Voice > Agent Voice
Adjust stability and clarity via the settings gear icon
Update cached audio for key phrases
Add pronunciations for brand-specific terms
Test thoroughly in Sandbox
Promote to Pre-release for UAT
Gather feedback and refine
Promote to Live

Fixing a pronunciation issue

Identify the mispronounced word from call recordings
Go to Channels > Voice > Response Control → Pronunciations tab
Add the word with correct IPA pronunciation
Test in Agent Chat
Publish and promote if working correctly

Optimizing for speed

Switch to Cartesia or another low-latency provider
Set interaction style to Swift or Turbo
Enable barge-in
Cache all common phrases in Audio Management
Test for interruption issues
Adjust as needed

Audio Management overview - Learn about audio caching and optimization
Response Control - Manage pronunciations and output controls
Voice Library - Browse and select voices
Agent Voice - Voice configuration options
Multi-language setup - Configure language support

PolyAcademy

Maintain

Voice and audio updates

Quick reference

Changing your agent’s voice

When to change voices

How to change the voice

Voice configuration in functions

Adjusting speech speed and timing

Global speech rate

Interaction style (response latency)

Barge-in settings

Managing audio quality

Audio Management tab

Editing cached audio

Uploading custom audio

Fixing pronunciation issues

Using Pronunciations

Using SSML for advanced control

Common pronunciation fixes

Troubleshooting audio issues

Common issues and solutions

Testing audio changes

Best practices

Voice selection

Audio optimization

Maintenance routine

Common workflows

Updating voice for a new brand

Fixing a pronunciation issue

Optimizing for speed

PolyAcademy

Maintain

​Quick reference

​Changing your agent’s voice

​When to change voices

​How to change the voice

​Voice configuration in functions

​Adjusting speech speed and timing

​Global speech rate

​Interaction style (response latency)

​Barge-in settings

​Managing audio quality

​Audio Management tab

​Editing cached audio

​Uploading custom audio

​Fixing pronunciation issues

​Using Pronunciations

​Using SSML for advanced control

​Common pronunciation fixes

​Troubleshooting audio issues

​Common issues and solutions

​Testing audio changes

​Best practices

​Voice selection

​Audio optimization

​Maintenance routine

​Common workflows

​Updating voice for a new brand

​Fixing a pronunciation issue

​Optimizing for speed

​Related pages

Quick reference

Changing your agent’s voice

When to change voices

How to change the voice

Voice configuration in functions

Adjusting speech speed and timing

Global speech rate

Interaction style (response latency)

Barge-in settings

Managing audio quality

Audio Management tab

Editing cached audio

Uploading custom audio

Fixing pronunciation issues

Using Pronunciations

Using SSML for advanced control

Common pronunciation fixes

Troubleshooting audio issues

Common issues and solutions

Testing audio changes

Best practices

Voice selection

Audio optimization

Maintenance routine

Common workflows

Updating voice for a new brand

Fixing a pronunciation issue

Optimizing for speed

Related pages