Skip to main content
Advanced configuration – This page covers complex platform settings. We recommend completing PolyAcademy Level 1 before proceeding.
This page covers voice-specific TTS tuning. If you are only working with webchat, you can skip this page.
PolyAI uses Text-to-Speech (TTS) to convert text into spoken language. Occasionally, uncommon words, domain-specific terms, or proper nouns may be mispronounced. In these cases, the TTS Pronunciations tab is available to embed pronunciation guidelines for key phrases in the global rules.

How it works

Pronunciations use the International Phonetic Alphabet (IPA) to define pronunciation rules. You can also use SSML (Speech Synthesis Markup Language) such as <break>, <prosody>, and <emphasis> in the replacement string. You may define pronunciation rules using regular expressions and replacements, including support for regex capture groups.

Multilingual pronunciation rules

For multilingual agents, pronunciation rules are organized by language. Each language has its own set of rules, displayed as separate collapsible cards. Rules within a language card only apply to responses in that language. To add a rule for a specific language:
  1. Expand the language card.
  2. Add the regex pattern and replacement.
  3. The rule automatically scopes to that language.
Rules with no language specified apply globally across all languages.

Rule evaluation order

Rule evaluation order Pronunciation rules are evaluated from top to bottom. Each rule runs on the text produced by the rule above it. Because rules are applied sequentially, later rules can modify or override earlier transformations. If multiple rules affect the same text, their order determines the final spoken output. For example, if you have:
  1. A rule that formats a phone number with pauses or separators
  2. A rule that converts digits into words
The formatting rule should appear above the digit-to-word rule.

Common pronunciation rule patterns

Most pronunciation rules follow a small set of patterns. You can copy these directly into your configuration and adjust them to fit your use case – no regex expertise required. Pronunciation rules showing regex patterns for phone numbers, zip codes, and URLs

Simple text replacement

To replace a specific string with a spoken equivalent, enter the text in the Expression field and the replacement in Replace with.
  • Regex: 3\-5
  • Replacement: three to five
Characters like . have special meaning in regex and must be escaped with a backslash (\.) to match literally. The hyphen (-) only needs escaping inside character classes like [a-z] – outside of brackets, 3-5 and 3\-5 both match the literal string “3-5”.

Read a phone number digit by digit

To make TTS read each digit of a phone number separately instead of as a large number:
  • Regex: \b(\d)(\d)(\d)-(\d)(\d)(\d)-(\d)(\d)(\d)(\d)\b
  • Replacement: \1 \2 \3, \4 \5 \6, \7 \8 \9 \10
This produces “six five one, three five nine, two nine two three” for 651-359-2923. The commas insert natural pauses between groups. For phone numbers written without hyphens (like 6513592923), use the same replacement with this pattern:
  • Regex: \b(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)\b

Read a zip code digit by digit

  • Regex: \b(\d)(\d)(\d)(\d)(\d)\b
  • Replacement: \1, \2, \3, \4, \5
Reads “nine, zero, two, one, zero” instead of “ninety thousand two hundred ten”.

Make a URL speakable

  • Regex: www\.
  • Replacement: W, W, W, dot

Format a phone number with SSML pauses

For more control over pause timing, you can use SSML <break> tags in the replacement:
  • Regex: \(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})
  • Replacement: \1 <break time="0.5s" /> \2 <break time="0.5s" /> \3
  • Case sensitive: FALSE
This handles phone numbers in various formats – (651) 359-2923, 651-359-2923, or 6513592923 – and inserts half-second pauses between groups. See SSML breaks for more timing options.

Regex quick reference

If you need to customize the patterns above or write your own, here are the building blocks used in pronunciation rules.
Some characters have special meaning in regex. To match them literally, prefix with a backslash (\).
CharacterRegex meaningTo match literally
.Any single character\.
-Range (only inside [])\- (not needed outside brackets)
( )Capture group\( \)
*Zero or more\*
+One or more\+
\b marks the edge of a word. It prevents a pattern from matching inside a longer string.For example, \b(\d{5})\b matches the zip code 90210 on its own but does not match the first five digits of 9021043210.
Parentheses () create capture groups you can reference in the replacement as \1, \2, \3, etc.
PatternWhat it matches
\dA single digit (0–9)
(\d)A single digit, captured for use in the replacement
(\d{3})Exactly three digits, captured as a group
Capturing each digit individually – (\d)(\d)(\d) instead of (\d{3}) – lets you insert spaces or commas between them in the replacement so TTS reads each one separately.

Example: IPA correction

  • Regex: \bLouvre\b
  • Replacement: /ˈluːvrə/
  • Case sensitive: FALSE
This ensures “Louvre” is pronounced correctly. Example pronunciation rules

Stop keywords

Block or log specific phrases in agent responses.

Translations

Override auto-translated content for multilingual agents.
Last modified on April 16, 2026