Skip to main content
Monitor flagged conversations and evaluate how your agent handles harmful content. Without regular safety reviews, harmful content can reach users or trigger compliance violations you only discover after the fact.
Safety dashboard

Metrics

MetricWhat it shows
Caller utterance risk levelHow risky incoming messages are and how well the agent manages them
Total callsTotal call count during the selected period
Calls managed for riskHow often safety filters were triggered (count and percentage)
Distribution of flagged callsTrends in flagged calls over time
Caller utterance categoryBreakdown by hate, self-harm, sexual content, and violence

Editing safety filters

To manage your filters, go to Configure > General in the sidebar.
Safety filter settings
PolyAI content filters catch harmful input from users and prevent inappropriate output from your agent. Filters combine PolyAI’s models with third-party services like Azure OpenAI. Filters run on both sides of the conversation in real time:
  • User input — catches toxic or inappropriate speech before it reaches the agent
  • AI output — prevents the agent from responding with anything unsafe

Filtering categories and severity levels

Each category can be set to a different severity level:
SeverityBehavior
SafeLabel only — no filtering
LowMost content allowed
MediumBalanced filtering
HighStrict filtering
Content that attacks or discriminates based on race, ethnicity, nationality, religion, gender identity, sexual orientation, disability, or appearance. Includes bullying, harassment, and slurs.
Content involving explicit anatomy, sexual acts, or romantic/erotic themes — including abusive or exploitative content.
Physical harm, threats, weapons, terrorism, and other violent acts or intimidation.
Mentions of suicide, self-injury, eating disorders, or content about hurting oneself.
Filters also include jailbreak risk detection — watching for attempts to bypass or disable safety features.

Language support

Filters have been trained and tested in: English, German, Japanese, Spanish, French, Italian, Portuguese, and Chinese. Other languages are supported but performance may vary — test thoroughly in your target language.

Best practices

  • Test thoroughly: Always run your own tests to validate how filters behave with your content.
  • Use the right level: Don’t default to High — find a balance that avoids both harm and over-filtering.
  • Be consistent: If you manage multiple agents or variants, use consistent flow and function names across them to simplify reporting and comparison.
For more technical background on Microsoft’s content filtering service, see Azure OpenAI safety documentation.

Standard dashboard

Day-to-day performance monitoring: containment, call volume, and duration.

General settings

Configure safety filter severity levels for your project.

Conversation review

Inspect individual flagged conversations in full transcript view.
Last modified on March 26, 2026