> ## Documentation Index
> Fetch the complete documentation index at: https://docs.poly.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Tutorial: Conversation reviews (advanced)

> PolyAcademy Level 2 – Use advanced diagnostics to trace behavior to its source and identify system-level improvements.

export const LessonMeta = ({level, difficulty, time}) => {
  const levelConfig = {
    1: {
      badge: 'bg-green-100 text-green-800 dark:bg-green-900 dark:text-green-200',
      label: 'Level 1'
    },
    2: {
      badge: 'bg-amber-100 text-amber-800 dark:bg-amber-900 dark:text-amber-200',
      label: 'Level 2'
    },
    3: {
      badge: 'bg-red-100 text-red-800 dark:bg-red-900 dark:text-red-200',
      label: 'Level 3'
    }
  };
  const difficultyConfig = {
    Beginner: 'bg-green-100 text-green-800 dark:bg-green-900 dark:text-green-200',
    Intermediate: 'bg-amber-100 text-amber-800 dark:bg-amber-900 dark:text-amber-200',
    Advanced: 'bg-red-100 text-red-800 dark:bg-red-900 dark:text-red-200'
  };
  const lvl = levelConfig[level] || levelConfig[1];
  const diffColor = difficultyConfig[difficulty] || difficultyConfig['Beginner'];
  return <div className="flex flex-wrap items-center gap-2 my-4 not-prose">
      <span className={`inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-semibold ${lvl.badge}`}>
        {lvl.label}
      </span>
      <span className={`inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-semibold ${diffColor}`}>
        {difficulty}
      </span>
      {time && <span className="inline-flex items-center gap-1 text-xs text-gray-500 dark:text-gray-400">
          <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
            <path strokeLinecap="round" strokeLinejoin="round" d="M12 6v6h4.5m4.5 0a9 9 0 11-18 0 9 9 0 0118 0z" />
          </svg>
          {time}
        </span>}
    </div>;
};

export const Quiz = ({questions = []}) => {
  const [selected, setSelected] = useState({});
  const [resetCount, setResetCount] = useState(0);
  const letters = ['A', 'B', 'C', 'D'];
  const handleSelect = (qIdx, optIdx) => {
    if (selected[qIdx] !== undefined) return;
    setSelected(prev => ({
      ...prev,
      [qIdx]: optIdx
    }));
  };
  const handleReset = () => {
    setSelected({});
    setResetCount(c => c + 1);
  };
  if (!questions?.length) return null;
  const getOptionClasses = ({hasAnswered, isThisCorrect, isThisSelected}) => {
    if (!hasAnswered) {
      return {
        btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-pointer border-gray-200 bg-white text-gray-700 hover:border-gray-300 hover:bg-gray-50 hover:shadow-sm dark:border-gray-600 dark:bg-gray-800 dark:text-gray-200 dark:hover:border-gray-500 dark:hover:bg-gray-700',
        badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-gray-100 text-gray-500 dark:bg-gray-700 dark:text-gray-300',
        icon: null
      };
    }
    if (isThisCorrect) {
      return {
        btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-default border-green-400 bg-green-50 text-green-900 font-medium dark:border-green-500 dark:bg-green-950 dark:text-green-100',
        badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-green-500 text-white dark:bg-green-500',
        icon: <svg className="shrink-0 w-4 h-4 text-green-500 dark:text-green-400 ml-auto" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2.5}>
            <path strokeLinecap="round" strokeLinejoin="round" d="M4.5 12.75l6 6 9-13.5" />
          </svg>
      };
    }
    if (isThisSelected) {
      return {
        btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-default border-red-400 bg-red-50 text-red-900 dark:border-red-500 dark:bg-red-950 dark:text-red-100',
        badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-red-500 text-white dark:bg-red-500',
        icon: <svg className="shrink-0 w-4 h-4 text-red-400 dark:text-red-400 ml-auto" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2.5}>
            <path strokeLinecap="round" strokeLinejoin="round" d="M6 18L18 6M6 6l12 12" />
          </svg>
      };
    }
    return {
      btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-default border-gray-100 bg-white text-gray-400 dark:border-gray-700 dark:bg-gray-800 dark:text-gray-500',
      badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-gray-100 text-gray-500 dark:bg-gray-700 dark:text-gray-500',
      icon: null
    };
  };
  return <div key={resetCount} className="my-6">
      {questions.map((q, qIdx) => {
    const answer = selected[qIdx];
    const hasAnswered = answer !== undefined;
    const isCorrect = answer === q.correct;
    return <div key={String(qIdx)} className="mb-8">
            <p className="flex items-start gap-2.5 font-semibold text-sm mb-3 mt-0 leading-relaxed text-gray-900 dark:text-gray-100">
              <span className="inline-flex items-center justify-center w-5 h-5 rounded-full bg-gray-800 dark:bg-gray-200 text-white dark:text-gray-900 text-xs font-bold shrink-0 mt-px leading-none">
                {qIdx + 1}
              </span>
              {q.q}
            </p>

            <div className="flex flex-col gap-2">
              {q.options.map((opt, i) => {
      const isThisCorrect = i === q.correct;
      const isThisSelected = i === answer;
      const {btn, badge, icon} = getOptionClasses({
        hasAnswered,
        isThisCorrect,
        isThisSelected
      });
      return <button key={String(i)} type="button" onClick={() => handleSelect(qIdx, i)} className={btn}>
                    <span className={badge}>{letters[i]}</span>
                    <span className="flex-1">{opt}</span>
                    {icon}
                  </button>;
    })}
            </div>

            {hasAnswered ? <div className={`mt-3 py-3 pl-4 pr-3.5 rounded-r-xl text-sm leading-relaxed border-l-4 ${isCorrect ? 'border-green-500 bg-green-50 dark:bg-green-950 dark:border-green-500' : 'border-red-500 bg-red-50 dark:bg-red-950 dark:border-red-500'}`}>
                <span className={`font-semibold ${isCorrect ? '!text-green-800 dark:!text-green-200' : '!text-red-800 dark:!text-red-200'}`}>
                  {isCorrect ? 'Correct.' : 'Not quite.'}
                </span>{' '}
                <span className="!text-gray-700 dark:!text-gray-300">{q.explanation}</span>
              </div> : null}
          </div>;
  })}

      <button type="button" onClick={handleReset} className="mt-1 text-xs text-gray-400 hover:text-gray-600 dark:hover:text-gray-300 underline underline-offset-2 cursor-pointer transition-colors duration-150">
        Reset quiz
      </button>
    </div>;
};

export const ProgressTracker = ({lessonNum, totalLessons, level}) => {
  const [checked, setChecked] = useState(false);
  return <div onClick={() => setChecked(prev => !prev)} className={checked ? 'flex items-center gap-3 p-4 rounded-lg border-2 border-green-600 bg-green-50 dark:bg-green-950 cursor-pointer select-none transition-all' : 'flex items-center gap-3 p-4 rounded-lg border-2 border-gray-200 dark:border-gray-600 bg-gray-50 dark:bg-gray-800 cursor-pointer select-none transition-all'}>
      <div className={checked ? 'w-5 h-5 rounded border-2 border-green-600 bg-green-600 flex items-center justify-center shrink-0 transition-all' : 'w-5 h-5 rounded border-2 border-gray-400 dark:border-gray-500 bg-white dark:bg-gray-800 flex items-center justify-center shrink-0 transition-all'}>
        {checked ? <svg width="10" height="8" viewBox="0 0 10 8" fill="none">
            <path d="M1 4L3.5 6.5L9 1" stroke="white" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round" />
          </svg> : null}
      </div>
      <div>
        <div className={checked ? 'font-semibold text-sm text-green-700 dark:text-green-300' : 'font-semibold text-sm text-gray-700 dark:text-gray-200'}>
          {checked ? 'Lesson complete' : 'Mark lesson complete'}
        </div>
        {lessonNum && totalLessons ? <div className="text-xs text-gray-500 dark:text-gray-400 mt-0.5">
            {level ? level + ' - ' : ''}Lesson {lessonNum} of {totalLessons}
          </div> : null}
      </div>
    </div>;
};

<Info>
  **Level 2 – Lesson 8 of 8** – Master advanced diagnostics to understand exactly why your agent behaves the way it does.
</Info>

<LessonMeta level={2} difficulty="Intermediate" time="15 min" />

At this stage, use [Conversation Review](/analytics/conversations/review) to answer questions like:

> Why did Variant A behave differently from Variant B?
>
> Was this failure caused by ASR, retrieval, rules, response control, or phrasing?
>
> Why did the agent *not* call a function it was allowed to call?

If you can't point to a specific system layer and say *"this is where the decision was made"*, the agent isn't under control yet.

## Beyond the transcript

At Level 1, the transcript was enough. At Level 2, the transcript is only the **symptom**. Real work happens in [diagnosis](/analytics/conversations/diagnosis) layers, function traces, variant attribution, and latency signals. Review with toggles on.

## Tracing a problem to its source

```mermaid theme={"theme":{"light":"github-light","dark":"github-dark"}}
flowchart TD
    A[Unexpected agent behavior] --> B{What does the transcript show?}
    B -->|Wrong word transcribed| C[ASR layer – check Transcript Corrections / Keyphrase Boosting]
    B -->|Wrong topic retrieved| D[KB layer – check topic name, sample questions]
    B -->|Right topic, wrong response| E[Behavior / Response Controls layer]
    B -->|Action didn't fire| F[Function trace – check call order and parameters]
    B -->|Variant A ≠ Variant B| G[Variant layer – check which variant handled each turn]
    B -->|Response too long for voice| H[Latency / interruption layer – tune pacing]
```

## Check your understanding

<Quiz
  questions={[
{
q: "An agent answers a billing question correctly, but you notice the function trace shows no call to `start_sms_flow` even though the topic is configured to offer SMS. What are the most likely causes?",
options: [
  "The SMS integration is down",
  "A missing action branch in the KB, a Response Control interrupting output, or a rules conflict",
  "The caller's phone number is invalid",
  "The variant doesn't support SMS",
],
correct: 1,
explanation: "When the agent says the right thing but doesn't execute the expected action, the issue is usually structural: a missing action branch, a Response Control halting output before the function fires, or a rules conflict preventing execution.",
}
]}
/>

## Advanced use of the Conversations table

Before opening individual conversations, shape the table itself.

Add these columns:

* **Variant**
* **Environment**
* **Tool call**
* **Handoff reason**
* **Duration**

Use this to:

* Compare variants side by side
* Spot regressions after promotion
* Identify behavior that only occurs in Live

> Example:
> Calls with Variant = B have longer durations and more handoffs.
>
> This is a signal before you even open a transcript.

<img src="https://mintcdn.com/polyai/rSFEoR-hN0C8DJ2I/images/analytics/conv-review-columns-visibility.png?fit=max&auto=format&n=rSFEoR-hN0C8DJ2I&q=85&s=f70df791a157d53f0b0adbc6df972003" alt="Columns Visibility panel showing toggleable pre-built metric columns for the conversations table" width="3014" height="1624" data-path="images/analytics/conv-review-columns-visibility.png" />

## Comparative review patterns

At Level 2, you should rarely inspect a single conversation in isolation.

Common patterns:

* Same intent, different variants
* Same KB topic, different phrasing
* Same user request across Chat and Call
* Same flow before and after a KB change

Conversation Review supports this by exposing **environment, variant, and function data together**.

## Diagnosis layers (deep use)

Toggle diagnosis layers selectively. Each answers a different class of question.

### Topic citations (advanced)

At this level, topic citations are not just about *correct vs incorrect*.

Use them to detect:

* Topic competition
* Overly generic topic names
* Sample question leakage across intents

> Example:
> Three topics are cited repeatedly for "late checkout":
>
> * late\_checkout
> * checkout\_policy
> * general\_stay\_questions
>
> This indicates retrieval ambiguity. The fix is structural, not textual.

### Tool calls (advanced)

Tool call traces show **what the agent committed to doing**, not just what it said.

Inspect:

* Call order
* Conditional execution
* Parameters passed
* Calls that *should* have happened but didn't

> Example:
> The agent asks for SMS consent but never calls `start_sms_flow`.
>
> This usually indicates:
>
> * A missing action branch in the KB
> * A response control interrupting output
> * A rules conflict preventing execution

### Flows and steps

Flows expose **decision paths**.

Use them when:

* Multiple conditions exist
* Behavior depends on prior turns
* The agent appears to "jump" topics

> Example:
> A billing question enters a reservation flow.
>
> This is often caused by:
>
> * Early entity capture
> * Over-eager routing rules
> * Poorly scoped flow entry conditions

### Variants

Variants let you attribute behavior to configuration, not chance.

Use this layer to:

* Confirm A/B test intent
* Validate rollout sequencing
* Identify variant-specific failures

> Example:
> Variant A answers directly.
> Variant B always clarifies first.
>
> Conversation Review lets you confirm this per turn, not anecdotally.

### Entities

Entities are where ASR, NLU, and logic meet.

Inspect entities to:

* Confirm values were actually captured
* Detect silent failures (nulls)
* Spot hallucinated structure

> Example:
> User says "tomorrow morning"
>
> Entity captured: date = today
>
> This is not a KB issue – it's extraction or phrasing.

### Turn latency and interruptions

These layers reveal **experience quality**, not correctness.

Use them to:

* Identify responses that are too long for voice
* Detect places users consistently interrupt
* Tune pacing and verbosity

> Example:
> High interruption rate during policy explanations usually means the response is technically correct but poorly shaped for audio.

## Audio analysis (calls)

At Level 2, audio review is not optional.

Use split audio to:

* Isolate ASR failures
* Hear barge-in timing
* Compare spoken length vs transcript length

This often explains why "perfectly fine" text responses fail in voice.

## Annotations as a system, not notes

At this stage, annotations should be **patterned**, not occasional.

Use them to:

* Track recurring KB gaps
* Justify ASR tuning
* Support decisions to split or retire topics

> Example:
> Five "Missing topic" annotations around refunds in one day is enough evidence to create a dedicated refund topic.

Annotations turn subjective impressions into actionable signals.

## Check your understanding

<Quiz
  questions={[
{
q: "The agent gave a wrong answer. You read the transcript but can't see why. What should you check next?",
options: [
  "Re-read the transcript – you must have missed something",
  "The function trace, topic citations, and diagnosis layers – the transcript shows symptoms, not root causes",
  "The agent's personality field – it may be overriding the response",
  "The voice settings – audio issues can cause transcript inaccuracies",
],
correct: 1,
explanation: "The transcript shows what happened, not why. At Level 2, root cause analysis uses diagnosis layers: function traces to see what fired, topic citations to see what triggered, and variant data to see what context was applied.",
}
]}
/>

## What good looks like

A strong review session ends with **specific changes**, not general feelings:

> Split topic X into two intents. Remove sample question Y. Add response control to suppress filler.

You can say what changed, where, and why that layer is responsible.

## Readiness standard

Before treating an agent as stable:

* You can trace any response back to configuration
* You can distinguish ASR, KB, rules, and variant causes
* You can predict how a change affects behavior
* You can verify impact in Conversation Review

## Try it yourself

<Steps>
  <Step title="Challenge: Investigate a variant discrepancy">
    Looking at your Conversations table, you notice that Variant A has a 40% handoff rate and Variant B has a 15% handoff rate – for the same types of customer queries.

    Describe your investigation:

    1. What is your first hypothesis?
    2. Which diagnosis layers would you check first?
    3. What specific data would confirm or rule out each hypothesis?

    <Accordion title="Hint">
      Think systematically: what could cause two variants to behave differently for the same query? Consider: variant-specific fields, KB topic overrides, response controls, and function logic.
    </Accordion>

    <Accordion title="Example solution">
      1. **First hypothesis:** Variant A has a handoff action wired to trigger more broadly – perhaps its SMS flow fails more often, or its fallback routing is more aggressive.

      2. **Layers to check first:**
         * **Function traces** – compare whether `transfer_call` is being called after different triggers in A vs B
         * **Variant fields** – check if A has different escalation language or action overrides
         * **Topic citations** – confirm the same KB topics are being retrieved for both variants

      3. **Confirming data:**
         * If function traces show `transfer_call` firing after different events → KB action branch issue
         * If topic citations differ between A and B → variant-specific KB override or sample question difference
         * If function traces are identical → check variant fields for different routing thresholds or transfer conditions
    </Accordion>
  </Step>
</Steps>

## Check your understanding

<Quiz
  questions={[
{
q: "Variant A has a 40% handoff rate and Variant B has 15% – for the same query types. What should you check first?",
options: [
  "The greeting text in each variant",
  "Function traces, variant fields, and topic citations to find where the two configurations diverge",
  "Whether Variant A has fewer sample questions",
  "The call duration for each variant",
],
correct: 1,
explanation: "When two variants behave differently for the same queries, compare function traces (are different actions firing?), variant fields (are escalation thresholds different?), and topic citations (are the same topics being retrieved?).",
}
]}
/>

## Metrics and dashboards

Beyond individual conversation review, you can use metrics and [dashboards](/analytics/dashboards/introduction) to identify patterns across many conversations.

### Filtering conversations

The **Conversations** page supports filtering by both built-in and custom metrics. Built-in metrics include environment, call duration, variant, and handoff reason. Custom metrics are values you log from your functions – for example, `cancel_initiated`, `id_v_successful`, or the brand the user asked about.

**Useful filter combinations:**

* **All handoffs** – filter by handoff reason "has any value" to see every transferred call
* **Specific handoff reason** – filter by a reason like "speak\_to\_agent" to find deflection opportunities
* **Custom metric** – filter by `cancel_initiated` to review all cancellation flows

<img src="https://mintcdn.com/polyai/rSFEoR-hN0C8DJ2I/images/analytics/conv-review-filters.png?fit=max&auto=format&n=rSFEoR-hN0C8DJ2I&q=85&s=98b8011e1bf4f67b8f8e8e4fc5b9c7fd" alt="Filter builder panel with multiple active conditions" width="2508" height="1628" data-path="images/analytics/conv-review-filters.png" />

<img src="https://mintcdn.com/polyai/rSFEoR-hN0C8DJ2I/images/analytics/conv-review-filter-chips.png?fit=max&auto=format&n=rSFEoR-hN0C8DJ2I&q=85&s=8f686f29e189d35113863aa61f7cb68d" alt="Active filter chips above the conversations table summarising applied conditions" width="2492" height="1484" data-path="images/analytics/conv-review-filter-chips.png" />

### QA metrics

The QA metric identifies which knowledge topic the agent used to answer each query:

* **Raven (voice)** – the LLM determines the QA metric directly by matching its response to the most relevant topic. This is accurate because the LLM has full context.
* **GPT-based agents (chat)** – the system encodes the user utterance, finds the closest topics by embedding similarity, generates a response, then matches the response back to topics. This can be less accurate when responses blend multiple topics.

A conversation can match more than one topic across turns. When that happens, the **QA** column in the conversations table shows every matched topic for that call, joined by commas (for example, `billing, handoff`), so you can see the full set of topics at a glance without opening each conversation. The same comma-joined format is used for any other custom metric that is logged multiple times on a single conversation.

### Using dashboards for improvement

A well-built dashboard tracks your key metrics (containment, transfer rate, call duration, authentication success) over time. Focus on:

1. **Containment trends** – are your improvements actually moving the number?
2. **Top queries** – what are users asking about most? Are there unhandled intents?
3. **Handoff reasons** – which reasons have the highest volume? Can you add flows or topics to reduce transfers?

For example, if "make an order" is a top query with high transfer rate, building an order troubleshooting flow could directly improve containment.

<CardGroup cols={2}>
  <Card title="← Previous: Variants" icon="arrow-left" href="/learn/guides/advanced/variants">
    Lesson 7 of 8
  </Card>

  <Card title="Level 2 complete →" icon="trophy" href="/learn/guides/advanced/finished">
    Recap and next steps
  </Card>
</CardGroup>

<ProgressTracker lessonKey="l2-8-conv-review" lessonNum={8} totalLessons={8} level="Level 2" />
