> ## Documentation Index
> Fetch the complete documentation index at: https://docs.poly.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Pronunciations

> Control how your agent pronounces specific terms.

<Info>
  This page covers voice-specific TTS tuning. If you are only working with webchat, you can skip this page.
</Info>

PolyAI uses [Text-to-Speech (TTS)](https://www.nvidia.com/en-gb/glossary/text-to-speech/) to convert text into spoken language. Occasionally, uncommon words, domain-specific terms, or proper nouns may be mispronounced. In these cases, the TTS **Pronunciations** tab is available to embed pronunciation guidelines for key phrases in the global rules.

## How it works

Pronunciations use the [International Phonetic Alphabet (IPA)](https://pronunciationstudio.com/english-ipa-chart/) to define pronunciation rules. You can also use SSML (Speech Synthesis Markup Language) such as `<break>`, `<prosody>`, and `<emphasis>` in the replacement string.

You may define pronunciation rules using regular expressions and replacements, including support for **regex capture groups**.

## Multilingual pronunciation rules

For multilingual agents, pronunciation rules are organized by language. Each language has its own set of rules, displayed as separate collapsible cards. Rules within a language card only apply to responses in that language.

To add a rule for a specific language:

1. Expand the language card.
2. Add the regex pattern and replacement.
3. The rule automatically scopes to that language.

Rules with no language specified apply globally across all languages.

### Rule evaluation order

<img src="https://mintcdn.com/polyai/Qu880HppNqT19Eyr/images/response-control/response-order.png?fit=max&auto=format&n=Qu880HppNqT19Eyr&q=85&s=612364472bf696ccb597c6b6c363c30e" alt="Rule evaluation order" width="2512" height="604" data-path="images/response-control/response-order.png" />

Pronunciation rules are evaluated **from top to bottom**. Each rule runs on the text produced by the rule above it.

Because rules are applied sequentially, later rules can modify or override earlier transformations. If multiple rules affect the same text, their order determines the final spoken output.

For example, if you have:

1. A rule that formats a phone number with pauses or separators
2. A rule that converts digits into words

The formatting rule should appear **above** the digit-to-word rule.

## Common pronunciation rule patterns

Most pronunciation rules follow a small set of patterns. You can copy these directly into your configuration and adjust them to fit your use case – no regex expertise required.

<img src="https://mintcdn.com/polyai/TrzNf46Qd9qTI294/images/response-control/regex-pronunciation-examples.png?fit=max&auto=format&n=TrzNf46Qd9qTI294&q=85&s=fb2ca25d07328f494b302e8c941af5ff" alt="Pronunciation rules showing regex patterns for phone numbers, zip codes, and URLs" width="2466" height="1148" data-path="images/response-control/regex-pronunciation-examples.png" />

### Simple text replacement

To replace a specific string with a spoken equivalent, enter the text in the **Expression** field and the replacement in **Replace with**.

* **Regex:** `3\-5`
* **Replacement:** `three to five`

<Note>
  Characters like `.` have special meaning in regex and must be escaped with a backslash (`\.`) to match literally. The hyphen (`-`) only needs escaping inside character classes like `[a-z]` – outside of brackets, `3-5` and `3\-5` both match the literal string "3-5".
</Note>

### Read a phone number digit by digit

To make TTS read each digit of a phone number separately instead of as a large number:

* **Regex:** `\b(\d)(\d)(\d)-(\d)(\d)(\d)-(\d)(\d)(\d)(\d)\b`
* **Replacement:** `\1 \2 \3, \4 \5 \6, \7 \8 \9 \10`

This produces "six five one, three five nine, two nine two three" for `651-359-2923`. The commas insert natural pauses between groups.

For phone numbers written without hyphens (like `6513592923`), use the same replacement with this pattern:

* **Regex:** `\b(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)\b`

### Read a zip code digit by digit

* **Regex:** `\b(\d)(\d)(\d)(\d)(\d)\b`
* **Replacement:** `\1, \2, \3, \4, \5`

Reads "nine, zero, two, one, zero" instead of "ninety thousand two hundred ten".

### Make a URL speakable

* **Regex:** `www\.`
* **Replacement:** `W, W, W, dot`

### Format a phone number with SSML pauses

For more control over pause timing, you can use SSML `<break>` tags in the replacement:

* **Regex:** `\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})`
* **Replacement:** `\1 <break time="0.5s" /> \2 <break time="0.5s" /> \3`
* **Case sensitive:** `FALSE`

This handles phone numbers in various formats – `(651) 359-2923`, `651-359-2923`, or `6513592923` – and inserts half-second pauses between groups. See [SSML breaks](https://cloud.google.com/text-to-speech/docs/ssml#break) for more timing options.

## Regex quick reference

If you need to customize the patterns above or write your own, here are the building blocks used in pronunciation rules.

<AccordionGroup>
  <Accordion title="Escaping special characters">
    Some characters have special meaning in regex. To match them literally, prefix with a backslash (`\`).

    | Character | Regex meaning            | To match literally                 |
    | --------- | ------------------------ | ---------------------------------- |
    | `.`       | Any single character     | `\.`                               |
    | `-`       | Range (only inside `[]`) | `\-` (not needed outside brackets) |
    | `(` `)`   | Capture group            | `\(` `\)`                          |
    | `*`       | Zero or more             | `\*`                               |
    | `+`       | One or more              | `\+`                               |
  </Accordion>

  <Accordion title="Word boundaries">
    `\b` marks the edge of a word. It prevents a pattern from matching inside a longer string.

    For example, `\b(\d{5})\b` matches the zip code `90210` on its own but does **not** match the first five digits of `9021043210`.
  </Accordion>

  <Accordion title="Capture groups and digit patterns">
    Parentheses `()` create capture groups you can reference in the replacement as `\1`, `\2`, `\3`, etc.

    | Pattern   | What it matches                                     |
    | --------- | --------------------------------------------------- |
    | `\d`      | A single digit (0–9)                                |
    | `(\d)`    | A single digit, captured for use in the replacement |
    | `(\d{3})` | Exactly three digits, captured as a group           |

    Capturing each digit individually – `(\d)(\d)(\d)` instead of `(\d{3})` – lets you insert spaces or commas between them in the replacement so TTS reads each one separately.
  </Accordion>
</AccordionGroup>

## Example: IPA correction

* **Regex**: `\bLouvre\b`
* **Replacement**: `/ˈluːvrə/`
* **Case sensitive**: `FALSE`

"Louvre" is now pronounced correctly regardless of context.

<img src="https://mintcdn.com/polyai/Qu880HppNqT19Eyr/images/response-control/tts-pronunciation.png?fit=max&auto=format&n=Qu880HppNqT19Eyr&q=85&s=ce8d5b15729ffbc2421a9deebfcafaf6" alt="Example pronunciation rules" width="2298" height="576" data-path="images/response-control/tts-pronunciation.png" />

## Related pages

<CardGroup cols={2}>
  <Card title="Stop keywords" icon="ban" href="/response-control/stop-keywords">
    Block or log specific phrases in agent responses.
  </Card>

  <Card title="Translations" icon="language" href="/response-control/translations">
    Override auto-translated content for multilingual agents.
  </Card>
</CardGroup>
