System overview
| Field | Details |
|---|---|
| System Name | Raven |
| Developer | PolyAI |
| Release Date | 16 September 2025 |
| Version | v3 |
Dataset summary
| Category | Description |
|---|---|
| Source or Owner | Data is sourced from PolyAI customers to the extent contractually authorised by customers and permitted by applicable law, or otherwise generated by PolyAI. |
| Purchased or Licensed | Licensed or otherwise owned by PolyAI. |
| Time Period of Data Collection | November 2024 – August 2025 |
| Date of First Use in Development | December 2024 |
| Scale of Dataset | Hundreds of thousands of conversational turns across tens of thousands of conversations. |
| Entirely Public Domain | No |
Intellectual property considerations
| Category | Description |
|---|---|
| Copyright, Trademark, or Patent Protection | The dataset may include information protected by copyright or trademark law belonging to PolyAI customers or PolyAI. |
| Ownership and Rights | All data used is licensed to or owned by PolyAI in accordance with contractual agreements and applicable law. |
Personal and consumer data
| Category | Description |
|---|---|
| Contains Personal Information | PolyAI takes all reasonable steps to redact personal information from the dataset prior to use. |
| Contains Aggregate Consumer Information | No |
Synthetic data usage
| Category | Description |
|---|---|
| Use of Synthetic Data | Yes. PolyAI augments real-world data with synthetic data where necessary to broaden coverage or improve specific system capabilities. |
Data processing and preparation
The dataset used for Raven v3 has undergone multiple processing steps to ensure quality, safety, and suitability for training customer service agents.| Processing Step | Description |
|---|---|
| Redaction | Removal of personal information. |
| Translation | Support for multilingual customer service use cases. |
| Filtering | Selection of desired data distributions to improve specific system capabilities. |
| Labelling | Annotation to provide efficient learning signals during system training and evaluation. |
Types of data used
| Category | Description |
|---|---|
| Data Format | Conversational logs. |
| Labelling Methodology | Conversations are labelled as positive and/or preferred customer service interactions and/or assigned graded preference scores. |
Purpose and intended use
| Category | Description |
|---|---|
| Purpose in Relation to the System | The dataset supports Raven’s intended purpose of powering agentic customer service conversations by providing real-world and synthetic examples of high-quality customer service interactions. |

