> ## Documentation Index
> Fetch the complete documentation index at: https://docs.poly.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Safety

> Monitor risky conversations and content filtering.

Monitor flagged conversations and evaluate how your agent handles harmful content. Without regular safety reviews, harmful content can reach users or trigger compliance violations you only discover after the fact.

<img src="https://mintcdn.com/polyai/ZVzKq3G72RK1ScpE/images/release-notes/2503/new-safety.png?fit=max&auto=format&n=ZVzKq3G72RK1ScpE&q=85&s=438fda60cc3743a3fbf017d1610ed22d" alt="Safety dashboard" width="3208" height="2300" data-path="images/release-notes/2503/new-safety.png" />

## Metrics

| Metric                            | What it shows                                                       |
| --------------------------------- | ------------------------------------------------------------------- |
| **Caller utterance risk level**   | How risky incoming messages are and how well the agent manages them |
| **Total calls**                   | Total call count during the selected period                         |
| **Calls managed for risk**        | How often safety filters were triggered (count and percentage)      |
| **Distribution of flagged calls** | Trends in flagged calls over time                                   |
| **Caller utterance category**     | Breakdown by hate, self-harm, sexual content, and violence          |

## Editing safety filters

Safety filters are configured on a **per-channel basis**. Each channel (voice and chat) can have its own filter settings that override the project-wide defaults.

* **Voice channel** – go to **Channels > Voice > Voice configuration**
* **Chat channel** – go to **Channels > Chat > Chat configuration**

<img src="https://mintcdn.com/polyai/eRhJIdFOz7BK3Q_g/images/voice/safety-filters-voice.png?fit=max&auto=format&n=eRhJIdFOz7BK3Q_g&q=85&s=e8a43a9ea1b1107c4e455aea5a85554a" alt="Safety filter settings for the voice channel" width="2978" height="1614" data-path="images/voice/safety-filters-voice.png" />

PolyAI content filters catch harmful input from users and prevent inappropriate output from your agent. Filters combine PolyAI's models with third-party services like [Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/content-safety/).

Filters run on both sides of the conversation in real time:

* **User input** – catches toxic or inappropriate speech before it reaches the agent
* **AI output** – prevents the agent from responding with anything unsafe

## Filtering categories and severity levels

Each category can be set to a different severity level:

| Severity     | Behavior                                     |
| ------------ | -------------------------------------------- |
| **Lenient**  | Block high severity content only             |
| **Moderate** | Block medium and high severity content       |
| **Strict**   | Block low, medium, and high severity content |

<AccordionGroup>
  <Accordion title="Hate" icon="ban">
    Content that attacks or discriminates based on race, ethnicity, nationality, religion, gender identity, sexual orientation, disability, or appearance. Includes bullying, harassment, and slurs.
  </Accordion>

  <Accordion title="Sexual" icon="triangle-exclamation">
    Content involving explicit anatomy, sexual acts, or romantic/erotic themes – including abusive or exploitative content.
  </Accordion>

  <Accordion title="Violence" icon="skull-crossbones">
    Physical harm, threats, weapons, terrorism, and other violent acts or intimidation.
  </Accordion>

  <Accordion title="Self-harm" icon="heart-crack">
    Mentions of suicide, self-injury, eating disorders, or content about hurting oneself.
  </Accordion>
</AccordionGroup>

Filters also include **jailbreak risk detection** – watching for attempts to bypass or disable safety features.

## Language support

Filters have been trained and tested in: English, German, Japanese, Spanish, French, Italian, Portuguese, and Chinese. Other languages are supported but performance may vary – test thoroughly in your target language.

## Best practices

* **Test thoroughly**: Always run your own tests to validate how filters behave with your content.
* **Use the right level**: Don't default to High – find a balance that avoids both harm and over-filtering.
* **Be consistent**: If you manage multiple agents or variants, use consistent flow and tool names across them to simplify reporting and comparison.

## Related pages

<CardGroup cols={3}>
  <Card title="Standard dashboard" icon="chart-line" href="./standard">
    Day-to-day performance monitoring: containment, call volume, and duration.
  </Card>

  <Card title="General settings" icon="gear" href="/settings/introduction">
    Configure safety filter severity levels for your project.
  </Card>

  <Card title="Conversation review" icon="magnifying-glass" href="/analytics/conversations/review">
    Inspect individual flagged conversations in full transcript view.
  </Card>
</CardGroup>
