Ensure your agent is dealing with risky conversations effectively.
The safety dashboard helps you monitor safety-related metrics, track risky conversations, and evaluate how well your agent is handling harmful content. It’s essential for making sure your agent complies with brand standards and safety expectations.
To manage your filters, go to Settings in the sidebar.
PolyAI content filters are designed to catch harmful input from users and prevent inappropriate output from your agent. Filters combine PolyAI’s models with third-party services like Azure OpenAI to keep conversations safe.
Content filters run on both sides of the conversation:
Filtering happens in real time and targets specific categories of risky content.
Filters target four core risk categories:
Each category has four severity levels:
You can choose different levels per category depending on your risk appetite. Safe-level content is always labeled but never blocked.
Category | Description |
---|---|
Hate | Covers content that attacks or discriminates based on race, ethnicity, nationality, religion, gender identity, sexual orientation, disability, or appearance. Includes bullying, harassment, and slurs. |
Sexual | Content involving explicit anatomy, sexual acts, or romantic/erotic themes — including abusive or exploitative content. Includes vulgar language, nudity, child exploitation, and grooming. |
Violence | Covers physical harm, threats, weapons, terrorism, and other violent acts or intimidation. Includes mentions of guns, attacks, or stalking. |
Self-harm | Mentions of suicide, self-injury, eating disorders, or any content about hurting oneself. |
Content filters have been trained and tested in the following languages:
Other languages are supported, but performance may vary. Always test thoroughly in your target language to ensure filters behave as expected.
For more technical background on Microsoft’s content filtering service, see Azure OpenAI safety documentation.