> ## Documentation Index
> Fetch the complete documentation index at: https://docs.poly.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Tutorial: Audio management and the cache

> PolyAcademy Level 2 – Manage cached audio, interaction styles, and pronunciation corrections.

export const LessonMeta = ({level, difficulty, time}) => {
  const levelConfig = {
    1: {
      badge: 'bg-green-100 text-green-800 dark:bg-green-900 dark:text-green-200',
      label: 'Level 1'
    },
    2: {
      badge: 'bg-amber-100 text-amber-800 dark:bg-amber-900 dark:text-amber-200',
      label: 'Level 2'
    },
    3: {
      badge: 'bg-red-100 text-red-800 dark:bg-red-900 dark:text-red-200',
      label: 'Level 3'
    }
  };
  const difficultyConfig = {
    Beginner: 'bg-green-100 text-green-800 dark:bg-green-900 dark:text-green-200',
    Intermediate: 'bg-amber-100 text-amber-800 dark:bg-amber-900 dark:text-amber-200',
    Advanced: 'bg-red-100 text-red-800 dark:bg-red-900 dark:text-red-200'
  };
  const lvl = levelConfig[level] || levelConfig[1];
  const diffColor = difficultyConfig[difficulty] || difficultyConfig['Beginner'];
  return <div className="flex flex-wrap items-center gap-2 my-4 not-prose">
      <span className={`inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-semibold ${lvl.badge}`}>
        {lvl.label}
      </span>
      <span className={`inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-semibold ${diffColor}`}>
        {difficulty}
      </span>
      {time && <span className="inline-flex items-center gap-1 text-xs text-gray-500 dark:text-gray-400">
          <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
            <path strokeLinecap="round" strokeLinejoin="round" d="M12 6v6h4.5m4.5 0a9 9 0 11-18 0 9 9 0 0118 0z" />
          </svg>
          {time}
        </span>}
    </div>;
};

export const Quiz = ({questions = []}) => {
  const [selected, setSelected] = useState({});
  const [resetCount, setResetCount] = useState(0);
  const letters = ['A', 'B', 'C', 'D'];
  const handleSelect = (qIdx, optIdx) => {
    if (selected[qIdx] !== undefined) return;
    setSelected(prev => ({
      ...prev,
      [qIdx]: optIdx
    }));
  };
  const handleReset = () => {
    setSelected({});
    setResetCount(c => c + 1);
  };
  if (!questions?.length) return null;
  const getOptionClasses = ({hasAnswered, isThisCorrect, isThisSelected}) => {
    if (!hasAnswered) {
      return {
        btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-pointer border-gray-200 bg-white text-gray-700 hover:border-gray-300 hover:bg-gray-50 hover:shadow-sm dark:border-gray-600 dark:bg-gray-800 dark:text-gray-200 dark:hover:border-gray-500 dark:hover:bg-gray-700',
        badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-gray-100 text-gray-500 dark:bg-gray-700 dark:text-gray-300',
        icon: null
      };
    }
    if (isThisCorrect) {
      return {
        btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-default border-green-400 bg-green-50 text-green-900 font-medium dark:border-green-500 dark:bg-green-950 dark:text-green-100',
        badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-green-500 text-white dark:bg-green-500',
        icon: <svg className="shrink-0 w-4 h-4 text-green-500 dark:text-green-400 ml-auto" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2.5}>
            <path strokeLinecap="round" strokeLinejoin="round" d="M4.5 12.75l6 6 9-13.5" />
          </svg>
      };
    }
    if (isThisSelected) {
      return {
        btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-default border-red-400 bg-red-50 text-red-900 dark:border-red-500 dark:bg-red-950 dark:text-red-100',
        badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-red-500 text-white dark:bg-red-500',
        icon: <svg className="shrink-0 w-4 h-4 text-red-400 dark:text-red-400 ml-auto" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2.5}>
            <path strokeLinecap="round" strokeLinejoin="round" d="M6 18L18 6M6 6l12 12" />
          </svg>
      };
    }
    return {
      btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-default border-gray-100 bg-white text-gray-400 dark:border-gray-700 dark:bg-gray-800 dark:text-gray-500',
      badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-gray-100 text-gray-500 dark:bg-gray-700 dark:text-gray-500',
      icon: null
    };
  };
  return <div key={resetCount} className="my-6">
      {questions.map((q, qIdx) => {
    const answer = selected[qIdx];
    const hasAnswered = answer !== undefined;
    const isCorrect = answer === q.correct;
    return <div key={String(qIdx)} className="mb-8">
            <p className="flex items-start gap-2.5 font-semibold text-sm mb-3 mt-0 leading-relaxed text-gray-900 dark:text-gray-100">
              <span className="inline-flex items-center justify-center w-5 h-5 rounded-full bg-gray-800 dark:bg-gray-200 text-white dark:text-gray-900 text-xs font-bold shrink-0 mt-px leading-none">
                {qIdx + 1}
              </span>
              {q.q}
            </p>

            <div className="flex flex-col gap-2">
              {q.options.map((opt, i) => {
      const isThisCorrect = i === q.correct;
      const isThisSelected = i === answer;
      const {btn, badge, icon} = getOptionClasses({
        hasAnswered,
        isThisCorrect,
        isThisSelected
      });
      return <button key={String(i)} type="button" onClick={() => handleSelect(qIdx, i)} className={btn}>
                    <span className={badge}>{letters[i]}</span>
                    <span className="flex-1">{opt}</span>
                    {icon}
                  </button>;
    })}
            </div>

            {hasAnswered ? <div className={`mt-3 py-3 pl-4 pr-3.5 rounded-r-xl text-sm leading-relaxed border-l-4 ${isCorrect ? 'border-green-500 bg-green-50 dark:bg-green-950 dark:border-green-500' : 'border-red-500 bg-red-50 dark:bg-red-950 dark:border-red-500'}`}>
                <span className={`font-semibold ${isCorrect ? '!text-green-800 dark:!text-green-200' : '!text-red-800 dark:!text-red-200'}`}>
                  {isCorrect ? 'Correct.' : 'Not quite.'}
                </span>{' '}
                <span className="!text-gray-700 dark:!text-gray-300">{q.explanation}</span>
              </div> : null}
          </div>;
  })}

      <button type="button" onClick={handleReset} className="mt-1 text-xs text-gray-400 hover:text-gray-600 dark:hover:text-gray-300 underline underline-offset-2 cursor-pointer transition-colors duration-150">
        Reset quiz
      </button>
    </div>;
};

export const ProgressTracker = ({lessonNum, totalLessons, level}) => {
  const [checked, setChecked] = useState(false);
  return <div onClick={() => setChecked(prev => !prev)} className={checked ? 'flex items-center gap-3 p-4 rounded-lg border-2 border-green-600 bg-green-50 dark:bg-green-950 cursor-pointer select-none transition-all' : 'flex items-center gap-3 p-4 rounded-lg border-2 border-gray-200 dark:border-gray-600 bg-gray-50 dark:bg-gray-800 cursor-pointer select-none transition-all'}>
      <div className={checked ? 'w-5 h-5 rounded border-2 border-green-600 bg-green-600 flex items-center justify-center shrink-0 transition-all' : 'w-5 h-5 rounded border-2 border-gray-400 dark:border-gray-500 bg-white dark:bg-gray-800 flex items-center justify-center shrink-0 transition-all'}>
        {checked ? <svg width="10" height="8" viewBox="0 0 10 8" fill="none">
            <path d="M1 4L3.5 6.5L9 1" stroke="white" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round" />
          </svg> : null}
      </div>
      <div>
        <div className={checked ? 'font-semibold text-sm text-green-700 dark:text-green-300' : 'font-semibold text-sm text-gray-700 dark:text-gray-200'}>
          {checked ? 'Lesson complete' : 'Mark lesson complete'}
        </div>
        {lessonNum && totalLessons ? <div className="text-xs text-gray-500 dark:text-gray-400 mt-0.5">
            {level ? level + ' - ' : ''}Lesson {lessonNum} of {totalLessons}
          </div> : null}
      </div>
    </div>;
};

**Level 2 – Lesson 5 of 8** – Understand and manage the audio cache for optimal performance.

<LessonMeta level={2} difficulty="Intermediate" time="15 min" />

[Audio Management](/voice-channel/audio-library) controls how TTS output is generated, cached, and replayed. Without understanding caching, teams often think changes "didn't apply" when they're hearing old audio.

## Understanding the audio cache

<CardGroup cols={2}>
  <Card title="What caching is" icon="database">
    Cached audio stores previously generated TTS so it can be replayed instantly, reducing latency and keeping repeated phrases consistent.
  </Card>

  <Card title="Cache requirements" icon="clock">
    Audio is only cached if the **same utterance is generated at least twice in a 24-hour window**.
  </Card>
</CardGroup>

One-off utterances will *not* persist in cache by default.

## Managing cached audio

<Steps>
  <Step title="Open Audio Management">
    Navigate to **Voice > Audio library** in the platform.
  </Step>

  <Step title="Review cached utterances">
    Check the list of cached utterances:

    * Greeting
    * Transfer / handoff language
    * SMS offer phrasing
    * Closings and confirmations
  </Step>

  <Step title="Adjust individual utterances">
    For any high-frequency utterance:

    * Open it and review how often it has been used
    * Adjust **stability** and **clarity** *for that utterance only* if needed
    * Use the **play** button to preview changes
  </Step>

  <Step title="Ensure stability for critical phrases">
    If an utterance must remain stable:

    * Generate it multiple times in 24 hours, or
    * Upload a static audio file to overwrite the cached version
  </Step>
</Steps>

## Check your understanding

<Quiz
  questions={[
{
q: "You updated your agent's greeting text and published the change, but callers still hear the old greeting. What is the most likely cause?",
options: [
  "The change wasn't saved before publishing",
  "The old greeting is still cached – start a new call session to hear the updated audio",
  "The voice model doesn't support greeting changes",
  "You need to re-select the agent voice first",
],
correct: 1,
explanation: "Cached audio persists until refreshed. After any voice or phrasing change, start a new call session to confirm you're hearing the updated audio, not a cached version.",
}
]}
/>

## Interaction style (response latency)

Interaction style controls how quickly the agent responds after detecting user speech. This directly affects interruption rate and perceived naturalness.

<Tabs>
  <Tab title="Turbo">
    **\~400ms latency**

    Extremely fast, higher interruption risk.
  </Tab>

  <Tab title="Swift">
    **\~1200ms latency**

    Prioritizes speed.
  </Tab>

  <Tab title="Balanced">
    **\~1600ms latency**

    Default for most use cases.
  </Tab>

  <Tab title="Precise">
    **\~2000ms latency**

    Slower, more deliberate, fewer interruptions.
  </Tab>
</Tabs>

## Barge-in

<AccordionGroup>
  <Accordion title="What is barge-in?" icon="hand">
    Barge-in determines whether callers can interrupt the agent mid-speech.
  </Accordion>

  <Accordion title="When to use it" icon="lightbulb">
    * Useful for Turbo mode
    * Can feel chaotic if enabled without careful phrasing and latency tuning
  </Accordion>
</AccordionGroup>

## Pronunciations

Ensure domain-specific terms are spoken clearly and correctly in Call.

Pronunciations are defined in the **Pronunciations** tab under **Voice > [Response Control](/voice-channel/advanced/pronunciations)** and applied globally. They modify how text is converted to speech, without changing the underlying text.

### When to use pronunciations

<CardGroup cols={2}>
  <Card title="Brand names" icon="trademark">
    Product names that are mispronounced
  </Card>

  <Card title="Proper nouns" icon="location-dot">
    Locations, people, departments
  </Card>

  <Card title="Numbers or IDs" icon="hashtag">
    Structured read-back requirements
  </Card>

  <Card title="Pacing" icon="gauge">
    Phrases where pacing matters for comprehension
  </Card>
</CardGroup>

### How pronunciations work

Matching is done using **regular expressions**. Replacements can be:

<Tabs>
  <Tab title="IPA">
    **International Phonetic Alphabet**

    For precise pronunciation control
  </Tab>

  <Tab title="SSML">
    **Speech Synthesis Markup Language**

    Such as `<break>` for pauses
  </Tab>

  <Tab title="Regex capture groups">
    **Pattern matching**

    `\1`, `\2`, etc. for reformatting
  </Tab>
</Tabs>

### Examples

<AccordionGroup>
  <Accordion title="IPA correction" icon="language">
    **Regex:** `\bLouvre\b`

    **Replacement:** `/ˈluːvrə/`

    **Case sensitive:** `FALSE`
  </Accordion>

  <Accordion title="Phone number formatting with pauses" icon="phone">
    **Regex:** `(\d{3})[ -]?(\d{3})[ -]?(\d{4})`

    **Replacement:** `\1 <break time="0.5s" /> \2 <break time="0.5s" /> \3`
  </Accordion>
</AccordionGroup>

### Best practices

<CardGroup cols={3}>
  <Card title="Incremental" icon="stairs">
    Add pronunciations one at a time
  </Card>

  <Card title="Test thoroughly" icon="vial">
    Test each change in Call before adding more
  </Card>

  <Card title="Keep it simple" icon="lightbulb">
    Prefer clarity over complexity – overly complex regex is hard to maintain
  </Card>
</CardGroup>

## Check your understanding

<Quiz
  questions={[
{
q: "When does Agent Studio cache an audio utterance?",
options: [
  "After every conversation",
  "When the same utterance appears 5+ times in a single session",
  "When the same utterance is generated at least twice in a 24-hour window",
  "Only for greetings and sign-off phrases",
],
correct: 2,
explanation: "One-off utterances don't persist in cache by default. Repetition in the 24-hour window is what triggers caching.",
}
]}
/>

## Verification checklist

<Check>
  **After any voice or phrasing change:**

  * Start a *new* call session
  * Confirm you are hearing updated audio, not a cached version
  * Validate that turn-taking still feels natural after changing latency or barge-in
  * Mispronounced terms are corrected consistently
  * Pauses improve comprehension rather than slowing the call excessively
</Check>

## Try it yourself

<Steps>
  <Step title="Challenge: Fix a mispronounced brand name">
    Your agent says "Hopper" but it is consistently pronounced incorrectly (sounds like "Hooper"). You also want phone numbers read back with a natural pause between each segment.

    Write both pronunciation configurations:

    1. IPA correction for "Hopper"
    2. Phone number formatting with 0.5s pauses

    <Accordion title="Hint">
      For the IPA, write out what "Hopper" sounds like phonetically. For the phone number, use regex capture groups to split the digits and insert SSML `<break>` tags.
    </Accordion>

    <Accordion title="Example solution">
      **Brand name correction:**

      * **Regex:** `\bHopper\b`
      * **Replacement:** `/ˈhɒpər/`
      * **Case sensitive:** FALSE

      **Phone number with pauses:**

      * **Regex:** `(\d{3})[ -]?(\d{3})[ -]?(\d{4})`
      * **Replacement:** `\1 <break time="0.5s" /> \2 <break time="0.5s" /> \3`
    </Accordion>
  </Step>
</Steps>

## Check your understanding

<Quiz
  questions={[
{
q: "You want the agent to respond as quickly as possible, even if it means a higher risk of interrupting users. Which interaction style should you choose?",
options: [
  "Balanced (~1600ms) – the default for most use cases",
  "Swift (~1200ms) – faster responses, lower interruption risk than Turbo",
  "Turbo (~400ms) – fastest response, highest interruption risk",
  "Precise (~2000ms) – most deliberate, fewest interruptions",
],
correct: 2,
explanation: "Turbo has the lowest latency at ~400ms, making the agent respond almost immediately. The trade-off is a higher risk of cutting the user off mid-sentence. Use it with care and thorough call testing.",
}
]}
/>

<CardGroup cols={2}>
  <Card title="← Previous: Response Control" icon="arrow-left" href="/learn/guides/advanced/response-control">
    Lesson 4 of 8
  </Card>

  <Card title="Next: Global ASR →" icon="arrow-right" href="/learn/guides/advanced/global-asr">
    Lesson 6 of 8
  </Card>
</CardGroup>

<ProgressTracker lessonKey="l2-5-audio" lessonNum={5} totalLessons={8} level="Level 2" />
