Voice & speech - PolyAI Platform

This section is about how your agent speaks and listens – the TTS voice it uses, how it handles audio, what it does mid-conversation, and how it transcribes what callers say. It is not about how customers reach your agent. For that, see Phone (Numbers and Web Calling) and Chat.

Raven produces responses that sound natural when spoken aloud. Recommended for any voice deployment.

What lives here

Agent Voice

Pick the TTS voice and tune stability, clarity, and speed.

Voice library

Browse and compare voices across providers (ElevenLabs, Cartesia, Hume, and more).

Choosing a good voice

Match voice to brand, audience, and industry.

Multi-voice

Use multiple voices in a single project.

Voice configuration

Greeting audio, disclaimer playback, model selection, safety filters, and call handling.

Response control

Translations, pronunciations, and stop keywords.

Audio management

Audio playback and recording settings.

Speech recognition

ASR settings – how the agent transcribes what callers say.

How to think about it

There are four concerns, and they’re configured separately:

How it sounds – Agent voice, voice library, multi-voice, custom voice. Picking a TTS voice and tuning it.
How it behaves in a call – Voice configuration: greeting, disclaimer, model selection, safety filters, call handling.
How it listens – Speech recognition (ASR) settings.
How it phrases things – Response control: translations, pronunciations, stop keywords.

Start with Choosing a good voice, then configure the voice in Agent Voice, and tune call-runtime behavior in Voice configuration.

Programmatic voice configuration

You can also configure voices programmatically using the voice class inside functions – for example, selecting a voice based on conversation context, caller preferences, or other runtime variables.

Voice conversation style guide

These guidelines help your voice agent sound natural rather than robotic. They focus on the linguistic patterns that make spoken conversations feel human. Natural conversation includes patterns that acknowledge conversational history and participants. These contribute to a sense of collaboration rather than rote routine-following. Use progressive tense for active collaboration:

“I’m not seeing any accounts under that phone number…” conveys active collaboration
“I don’t see any accounts” sounds too definitive

Reference shared context implicitly – don’t restate what both parties already know:

“How about Wednesday instead?” (not “How about Wednesday instead of Tuesday?”)
“In that case, how does Saturday at 2:30 sound?” (not “Since you said you prefer weekends…”)

Vary confirmationals – use a mix of “Great,” “Okay,” “Perfect,” and “Sure” rather than repeating the same one. Use conversational datives for a collaborative feel:

“Could you read me your account number?” rather than “Could you read your account number aloud?”
“Can you log into your account for me?” rather than “Can you log into your account?”

Use face-saving past tense when referencing a user’s request:

“When were you trying to come in?” rather than “When are you trying to come in?”

Avoid over-explaining

LLMs tend to justify every action in a way humans don’t. Most of the time, the important information and the request can be formed into a single sentence:

“No problem, what’s your account number?” rather than “To check for outages, I’ll need to look up your account. Could you tell me your account number?”

Walkthrough conversations

When giving multi-turn walkthroughs, don’t end every step with “let me know when you’ve done that.” Provide the instruction and wait – the user will confirm on their own.

Phone

Voice transports – let customers reach your agent over the phone (Numbers) or from your website (Web Calling).

Chat

Add a text-based chat widget to your website.

Documentation Index

​What lives here

Agent Voice

Voice library

Choosing a good voice

Multi-voice

Voice configuration

Response control

Audio management

Speech recognition

​How to think about it

​Programmatic voice configuration

​Voice conversation style guide

​Social presence markers

​Avoid over-explaining

​Walkthrough conversations

​Related

Phone

Chat

What lives here

How to think about it

Programmatic voice configuration

Voice conversation style guide

Social presence markers

Avoid over-explaining

Walkthrough conversations

Related