Model
Determine which GPT model to use with your agent, or connect your own LLM endpoint.
The Model section in Agent Settings lets you choose the Large Language Model (LLM) that best fits your agent’s needs — or plug in your own.
Below is an overview of the models currently available:
GPT-4o Mini
The lightweight version of GPT-4o, offering faster performance and cost-efficiency. Best for simple queries and use-cases that prioritise speed and affordability.
GPT-4o
An optimised version of GPT-4 with high-quality responses and reduced latency. Ideal when you need both accuracy and responsiveness.
GPT-4
The most advanced model for generating detailed, high-quality responses. Recommended for complex tasks requiring precision and context.
GPT-3.5 (0125)
An enhanced build of GPT-3.5 with stability improvements for specific workloads. Balances performance and cost.
GPT-3.5
A reliable, cost-efficient option when speed and affordability are the main concerns. Good for straightforward interactions and real-time responses.
Bedrock Claude 3.5 Haiku
A lightweight version of Anthropic’s Claude model, hosted on AWS Bedrock. Suitable for simple, predictable tasks.
PolyLLM
PolyAI’s proprietary model, optimised for real-time voice interactions.
Gemini 1.5 (coming soon)
Google’s next-generation LLM focused on reasoning and long context windows. Currently being integrated.
Mistral (coming soon)
An open-weight model designed for high-performance reasoning and coding tasks. Integration planned for a future release.
Configuring the model
- Open Agent Settings → Large Language Model.
- Select the desired model from the dropdown.
- Click Save to apply your changes.
For more details on each provider, see:
- OpenAI Models
- Anthropic (Claude)
- Google DeepMind (Gemini)
- Mistral
- Contact PolyAI for information about PolyLLM.
Bring Your Own Model (BYOM)
PolyAI supports bring-your-own-model (BYOM) via a simple API integration. If you run your own LLM, expose an endpoint that follows the OpenAI chat/completions
schema and PolyAI will treat it like any other provider.
Overview
- Expose an API endpoint that accepts/returns data in the OpenAI
chat/completions
format. - Provide authentication — PolyAI can send either an
x-api-key
header or a Bearer token. - (Optional) Support streaming responses using
stream: true
.
API endpoint
Request format
You might receive extra OpenAI-style fields such as frequency_penalty
, presence_penalty
, etc.
Response format
Streaming support (optional)
If stream
is true
, send Server-Sent Events (SSE) mirroring OpenAI’s format:
Authentication
Method | Header sent by PolyAI |
---|---|
API Key | x-api-key: YOUR_API_KEY |
Bearer | Authorization: Bearer YOUR_TOKEN |
Configure your server to accept one of the above.
Sample implementation (Python / Flask)
Final checklist
- Endpoint reachable via POST.
- Request/response match OpenAI
chat/completions
schema. - Authentication header configured (API Key or Bearer token).
- (Optional) Streaming supported if needed.
Send to your PolyAI contact:
- Endpoint URL
- Model ID
- Auth method & credential