
Metrics
- Caller utterance risk level: Shows how risky incoming messages are and how well the agent manages them.
- Total calls: Total number of calls during the selected period.
- Number of calls managed for risk: How often the safety filters were triggered.
- Percentage of calls managed for risk: How many of those calls involved flagged content.
- Distribution of flagged calls: Highlights trends in flagged calls over time.
- Distribution count of flagged calls: Shows peaks in flagged call volume.
- Caller utterance category distribution:
- Broken down into hate, self-harm, sexual content, and violence.
- Uses color-coded visuals for easy tracking.
Editing safety filters
To manage your filters, go to Settings in the sidebar.
How filters work
Content filters run on both sides of the conversation:- User input: Catches toxic or inappropriate speech before it reaches the agent.
- AI output: Prevents the agent from responding with anything unsafe or non-compliant.
Filtering categories and severity levels
Filters target four core risk categories:- Hate
- Sexual
- Violence
- Self-harm
- Safe (label only — no filtering)
- Low (most content allowed)
- Medium (balanced filtering)
- High (strict filtering)
Category details
Category | Description |
---|---|
Hate | Covers content that attacks or discriminates based on race, ethnicity, nationality, religion, gender identity, sexual orientation, disability, or appearance. Includes bullying, harassment, and slurs. |
Sexual | Content involving explicit anatomy, sexual acts, or romantic/erotic themes — including abusive or exploitative content. Includes vulgar language, nudity, child exploitation, and grooming. |
Violence | Covers physical harm, threats, weapons, terrorism, and other violent acts or intimidation. Includes mentions of guns, attacks, or stalking. |
Self-harm | Mentions of suicide, self-injury, eating disorders, or any content about hurting oneself. |
Additional filtering
- Jailbreak risk detection: Filters also watch for attempts to bypass or disable safety features.
Language support
Content filters have been trained and tested in the following languages:- English
- German
- Japanese
- Spanish
- French
- Italian
- Portuguese
- Chinese
Best practices
- Test thoroughly: Always run your own tests to validate how filters behave with your content.
- Use the right level: Don’t default to High — find a balance that avoids both harm and over-filtering.
- Standardize features: If you’re using filters in templates or shared projects, try to use the same flows and function names across them.