Pronunciations - PolyAI Platform

This page covers voice-specific TTS tuning. If you are only working with webchat, you can skip this page.

PolyAI uses Text-to-Speech (TTS) to convert text into spoken language. Occasionally, uncommon words, domain-specific terms, or proper nouns may be mispronounced. In these cases, the TTS Pronunciations tab is available to embed pronunciation guidelines for key phrases in the global rules.

How it works

Pronunciations use the International Phonetic Alphabet (IPA) to define pronunciation rules. You can also use SSML (Speech Synthesis Markup Language) such as <break>, <prosody>, and <emphasis> in the replacement string. You may define pronunciation rules using regular expressions and replacements, including support for regex capture groups.

Multilingual pronunciation rules

For multilingual agents, pronunciation rules are organized by language. Each language has its own set of rules, displayed as separate collapsible cards. Rules within a language card only apply to responses in that language. To add a rule for a specific language:

Expand the language card.
Add the regex pattern and replacement.
The rule automatically scopes to that language.

Rules with no language specified apply globally across all languages.

Rule evaluation order

Pronunciation rules are evaluated from top to bottom. Each rule runs on the text produced by the rule above it. Because rules are applied sequentially, later rules can modify or override earlier transformations. If multiple rules affect the same text, their order determines the final spoken output. For example, if you have:

A rule that formats a phone number with pauses or separators
A rule that converts digits into words

The formatting rule should appear above the digit-to-word rule.

Common pronunciation rule patterns

Most pronunciation rules follow a small set of patterns. You can copy these directly into your configuration and adjust them to fit your use case – no regex expertise required.

Pronunciation rules showing regex patterns for phone numbers, zip codes, and URLs

Simple text replacement

To replace a specific string with a spoken equivalent, enter the text in the Expression field and the replacement in Replace with.

Regex: 3\-5
Replacement: three to five

Characters like . have special meaning in regex and must be escaped with a backslash (\.) to match literally. The hyphen (-) only needs escaping inside character classes like [a-z] – outside of brackets, 3-5 and 3\-5 both match the literal string “3-5”.

Read a phone number digit by digit

To make TTS read each digit of a phone number separately instead of as a large number:

Regex: \b(\d)(\d)(\d)-(\d)(\d)(\d)-(\d)(\d)(\d)(\d)\b
Replacement: \1 \2 \3, \4 \5 \6, \7 \8 \9 \10

This produces “six five one, three five nine, two nine two three” for 651-359-2923. The commas insert natural pauses between groups. For phone numbers written without hyphens (like 6513592923), use the same replacement with this pattern:

Regex: \b(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)\b

Read a zip code digit by digit

Regex: \b(\d)(\d)(\d)(\d)(\d)\b
Replacement: \1, \2, \3, \4, \5

Reads “nine, zero, two, one, zero” instead of “ninety thousand two hundred ten”.

Make a URL speakable

Regex: www\.
Replacement: W, W, W, dot

Format a phone number with SSML pauses

For more control over pause timing, you can use SSML <break> tags in the replacement:

Regex: \(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})
Replacement: \1 <break time="0.5s" /> \2 <break time="0.5s" /> \3
Case sensitive: FALSE

This handles phone numbers in various formats – (651) 359-2923, 651-359-2923, or 6513592923 – and inserts half-second pauses between groups. See SSML breaks for more timing options.

Regex quick reference

If you need to customize the patterns above or write your own, here are the building blocks used in pronunciation rules.

Escaping special characters

Some characters have special meaning in regex. To match them literally, prefix with a backslash (\).

Character	Regex meaning	To match literally
`.`	Any single character	`\.`
`-`	Range (only inside `[]`)	`\-` (not needed outside brackets)
`(` `)`	Capture group	`\(` `\)`
`*`	Zero or more	`\*`
`+`	One or more	`\+`

Word boundaries

\b marks the edge of a word. It prevents a pattern from matching inside a longer string.For example, \b(\d{5})\b matches the zip code 90210 on its own but does not match the first five digits of 9021043210.

Capture groups and digit patterns

Parentheses () create capture groups you can reference in the replacement as \1, \2, \3, etc.

Pattern	What it matches
`\d`	A single digit (0–9)
`(\d)`	A single digit, captured for use in the replacement
`(\d{3})`	Exactly three digits, captured as a group

Capturing each digit individually – (\d)(\d)(\d) instead of (\d{3}) – lets you insert spaces or commas between them in the replacement so TTS reads each one separately.

Example: IPA correction

Regex: \bLouvre\b
Replacement: /ˈluːvrə/
Case sensitive: FALSE

“Louvre” is now pronounced correctly regardless of context.

Stop keywords

Block or log specific phrases in agent responses.

Translations

Override auto-translated content for multilingual agents.

​How it works

​Multilingual pronunciation rules

​Rule evaluation order

​Common pronunciation rule patterns

​Simple text replacement

​Read a phone number digit by digit

​Read a zip code digit by digit

​Make a URL speakable

​Format a phone number with SSML pauses

​Regex quick reference

​Example: IPA correction

​Related pages

Stop keywords

Translations

How it works

Multilingual pronunciation rules

Rule evaluation order

Common pronunciation rule patterns

Simple text replacement

Read a phone number digit by digit

Read a zip code digit by digit

Make a URL speakable

Format a phone number with SSML pauses

Regex quick reference

Example: IPA correction

Related pages