GPT-4o Mini
The lightweight version of GPT-4o, offering faster performance and cost-efficiency. Best for simple queries and use-cases that prioritise speed and affordability.GPT-4o
An optimised version of GPT-4 with high-quality responses and reduced latency. Ideal when you need both accuracy and responsiveness.GPT-4
The most advanced model for generating detailed, high-quality responses. Recommended for complex tasks requiring precision and context.GPT-3.5 (0125)
An enhanced build of GPT-3.5 with stability improvements for specific workloads. Balances performance and cost.GPT-3.5
A reliable, cost-efficient option when speed and affordability are the main concerns. Good for straightforward interactions and real-time responses.Bedrock Claude 3.5 Haiku
A lightweight version of Anthropic’s Claude model, hosted on AWS Bedrock. Suitable for simple, predictable tasks.Raven
PolyAI’s proprietary model, optimised for real-time voice interactions.Gemini 1.5 (coming soon)
Google’s next-generation LLM focused on reasoning and long context windows. Currently being integrated.Mistral (coming soon)
An open-weight model designed for high-performance reasoning and coding tasks. Integration planned for a future release.Configuring the model

- Open Agent Settings → Large Language Model.
- Select the desired model from the dropdown.
- Click Save to apply your changes.
- OpenAI Models
- Anthropic (Claude)
- Google DeepMind (Gemini)
- Mistral
- Amazon Nova Micro
- Contact PolyAI for information about Raven, PolyAI’s proprietary LLM.
Bring Your Own Model (BYOM)
PolyAI supports bring-your-own-model (BYOM) via a simple API integration. If you run your own LLM, expose an endpoint that follows the OpenAIchat/completions
schema and PolyAI will treat it like any other provider.
Overview
- Expose an API endpoint that accepts/returns data in the OpenAI
chat/completions
format. - Provide authentication — PolyAI can send either an
x-api-key
header or a Bearer token. - (Optional) Support streaming responses using
stream: true
.
API endpoint
Request format
frequency_penalty
, presence_penalty
, etc.
Response format
Streaming support (optional)
Ifstream
is true
, send Server-Sent Events (SSE) mirroring OpenAI’s format:
Authentication
Method | Header sent by PolyAI |
---|---|
API Key | x-api-key: YOUR_API_KEY |
Bearer | Authorization: Bearer YOUR_TOKEN |
Sample implementation (Python / Flask)
Final checklist
- Endpoint reachable via POST.
- Request/response match OpenAI
chat/completions
schema. - Authentication header configured (API Key or Bearer token).
- (Optional) Streaming supported if needed.
- Endpoint URL
- Model ID
- Auth method & credential