Skip to main content
Advanced configuration — This page covers complex platform settings. We recommend completing PolyAcademy Level 1 before proceeding.
This page covers voice-specific TTS tuning. If you are only working with webchat, you can skip this page.
PolyAI uses Text-to-Speech (TTS) to convert text into spoken language. Occasionally, uncommon words, domain-specific terms, or proper nouns may be mispronounced. In these cases, the TTS Pronunciations tab is available to embed pronunciation guidelines for key phrases in the global rules.

Key points

  • Enhanced pronunciation accuracy: Correctly pronounce domain-specific terms and unique phrases using IPA.
  • Streamlined workflow: Manage rules directly in Agent Studio.
  • Flexibility: Adjust pauses and pronunciation for various needs.

How it works

Pronunciations use the International Phonetic Alphabet (IPA) to define pronunciation rules. You can also use SSML (Speech Synthesis Markup Language) such as <break>, <prosody>, and <emphasis> in the replacement string. You may define pronunciation rules using regular expressions and replacements, including support for regex capture groups.

Multilingual pronunciation rules

For multilingual agents, pronunciation rules are organized by language. Each language has its own set of rules, displayed as separate collapsible cards. Rules within a language card only apply to responses in that language. To add a rule for a specific language:
  1. Expand the language card.
  2. Add the regex pattern and replacement.
  3. The rule automatically scopes to that language.
Rules with no language specified apply globally across all languages.

Rule evaluation order

Rule evaluation order
Pronunciation rules are evaluated from top to bottom. Each rule runs on the text produced by the rule above it. Because rules are applied sequentially, later rules can modify or override earlier transformations. If multiple rules affect the same text, their order determines the final spoken output. For example, if you have:
  1. A rule that formats a phone number with pauses or separators
  2. A rule that converts digits into words
The formatting rule should appear above the digit-to-word rule.

Using capture groups in replacements

If your regular expression uses capture groups (for example, (\d{3})), you can refer to these in the replacement string using \1, \2, etc. This allows you to reformat matched text dynamically.

Example: Formatting a phone number with pauses

To transform a number like (651) 359-2923 into:
“six five one [pause] three five nine [pause] two nine two three”
Use the following pronunciation rule:
  • Regex: $begin:math:text$?(\\d{3})$end:math:text$?[ -]?(\d{3})[ -]?(\d{4})
  • Replacement: \1 <break time="0.5s" /> \2 <break time="0.5s" /> \3
  • Case sensitive: FALSE
This uses SSML breaks between capture groups, allowing for natural read-back of phone numbers or similar patterns.

Example: IPA correction

  • Regex: /\bLouvre\b/
  • Replacement: /ˈluːvrə/
  • Case sensitive: FALSE
This ensures “Louvre” is pronounced correctly.
Example pronunciation rules
Last modified on March 27, 2026