- Cached audio stores previously generated TTS so it can be replayed instantly.
- This reduces latency and keeps repeated phrases consistent.
- Audio is only cached if the same utterance is generated at least twice within a 24-hour window.
- One-off utterances will not persist in cache by default.
- Open Audio Management.
- Review the list of cached utterances:
- Greeting
- Transfer / handoff language
- SMS offer phrasing
- Closings and confirmations
- For any high-frequency utterance:
- Open it and review how often it has been used.
- Adjust stability and clarity for that utterance only if needed.
- Use the play button to preview changes.
- If an utterance must remain stable:
- Generate it multiple times within 24 hours, or
- Upload a static audio file to overwrite the cached version.
- Turbo (~400ms): Extremely fast, higher interruption risk.
- Swift (~1200ms): Prioritises speed.
- Balanced (~1600ms): Default for most use cases.
- Precise (~2000ms): Slower, more deliberate, fewer interruptions.
- Useful for fast modes (Turbo/Swift).
- Can feel chaotic if enabled without careful phrasing and latency tuning.
- After any voice or phrasing change, start a new call session.
- Confirm you are hearing updated audio, not a cached version.
- Validate that turn-taking still feels natural after changing latency or barge-in.
- Brand names or product names that are mispronounced.
- Proper nouns (locations, people, departments).
- Numbers or IDs that need structured read-back.
- Any phrase where pacing matters for comprehension.
- Matching is done using regular expressions.
- Replacements can be:
- IPA (International Phonetic Alphabet)
- SSML, such as
<break>for pauses - Regex capture groups (
\1,\2, etc.) for reformatting.
- Regex:
\bLouvre\b - Replacement:
/ˈluːvrə/ - Case sensitive:
FALSE
- Regex:
(\d{3})[ -]?(\d{3})[ -]?(\d{4}) - Replacement:
\1 <break time="0.5s" /> \2 <break time="0.5s" /> \3
- Add pronunciations incrementally.
- Test each change in Call before adding more.
- Prefer clarity over cleverness—overly complex regex is hard to maintain.
- Mispronounced terms are corrected consistently.
- Pauses improve comprehension rather than slowing the call excessively.

