Model

The Model section in Agent Settings lets you choose the Large Language Model (LLM) that best fits your agent’s needs — or plug in your own. Below is an overview of the models currently available:

GPT-4o Mini

The lightweight version of GPT-4o, offering faster performance and cost-efficiency. Best for simple queries and use-cases that prioritise speed and affordability.

GPT-4o

An optimised version of GPT-4 with high-quality responses and reduced latency. Ideal when you need both accuracy and responsiveness.

GPT-4

The most advanced model for generating detailed, high-quality responses. Recommended for complex tasks requiring precision and context.

GPT-3.5 (0125)

An enhanced build of GPT-3.5 with stability improvements for specific workloads. Balances performance and cost.

GPT-3.5

A reliable, cost-efficient option when speed and affordability are the main concerns. Good for straightforward interactions and real-time responses.

Bedrock Claude 3.5 Haiku

A lightweight version of Anthropic’s Claude model, hosted on AWS Bedrock. Suitable for simple, predictable tasks.

Raven

PolyAI’s proprietary model, optimised for real-time voice interactions.

Gemini 1.5 (coming soon)

Google’s next-generation LLM focused on reasoning and long context windows. Currently being integrated.

Mistral (coming soon)

An open-weight model designed for high-performance reasoning and coding tasks. Integration planned for a future release.

Configuring the model

Open Agent Settings → Large Language Model.
Select the desired model from the dropdown.
Click Save to apply your changes.

For more details on each provider, see:

OpenAI Models
Anthropic (Claude)
Google DeepMind (Gemini)
Mistral
Amazon Nova Micro
Contact PolyAI for information about Raven, PolyAI’s proprietary LLM.

Bring Your Own Model (BYOM)

PolyAI supports bring-your-own-model (BYOM) via a simple API integration. If you run your own LLM, expose an endpoint that follows the OpenAI chat/completions schema and PolyAI will treat it like any other provider.

Overview

Expose an API endpoint that accepts/returns data in the OpenAI chat/completions format.
Provide authentication — PolyAI can send either an x-api-key header or a Bearer token.
(Optional) Support streaming responses using stream: true.

API endpoint

Request format

    {
      "model": "your-model-id",
      "messages": [
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "What's the weather today?" }
      ],
      "temperature": 0.7,
      "top_p": 1.0,
      "stream": false
    }

You might receive extra OpenAI-style fields such as frequency_penalty, presence_penalty, etc.

Response format

    {
      "id": "chatcmpl-abc123",
      "object": "chat.completion",
      "created": 1712345678,
      "model": "your-model-id",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "It’s sunny today in London."
          },
          "finish_reason": "stop"
        }
      ]
    }

Streaming support (optional)

If stream is true, send Server-Sent Events (SSE) mirroring OpenAI’s format:

    data: {
      "id": "...",
      "object": "chat.completion.chunk",
      "choices": [{
        "delta": { "content": "Hello" },
        "index": 0,
        "finish_reason": null
      }]
    }

    data: {
      "choices": [{
        "delta": {},
        "index": 0,
        "finish_reason": "stop"
      }]
    }

    data: [DONE]

Authentication

Method	Header sent by PolyAI
API Key	`x-api-key: YOUR_API_KEY`
Bearer	`Authorization: Bearer YOUR_TOKEN`

Configure your server to accept one of the above.

Sample implementation (Python / Flask)

    from flask import Flask, request, jsonify
    import time, uuid

    app = Flask(__name__)

    @app.route('/chat/completions', methods=['POST'])
    def chat_completions():
        data = request.json
        messages = data.get('messages', [])
        user_input = messages[-1]['content'] if messages else ''

        # TODO: insert your model inference here
        reply = f'You said: {user_input}'

        return jsonify({
            'id': f'chatcmpl-{uuid.uuid4().hex}',
            'object': 'chat.completion',
            'created': int(time.time()),
            'model': 'my-llm',
            'choices': [{
                'index': 0,
                'message': { 'role': 'assistant', 'content': reply },
                'finish_reason': 'stop'
            }]
        })

Final checklist

Endpoint reachable via POST.
Request/response match OpenAI chat/completions schema.
Authentication header configured (API Key or Bearer token).
(Optional) Streaming supported if needed.

Send to your PolyAI contact:

Endpoint URL
Model ID
Auth method & credential

Introduction

Manage

Build

Voice

Configure

Troubleshoot

Legal

GPT-4o Mini

GPT-4o

GPT-4

GPT-3.5 (0125)

GPT-3.5

Bedrock Claude 3.5 Haiku

Raven

Gemini 1.5 (coming soon)

Mistral (coming soon)

Configuring the model

Bring Your Own Model (BYOM)

Overview

API endpoint

Request format

Response format

Streaming support (optional)

Authentication

Sample implementation (Python / Flask)

Final checklist

Introduction

Manage

Build

Voice

Configure

Troubleshoot

Legal

​GPT-4o Mini

​GPT-4o

​GPT-4

​GPT-3.5 (0125)

​GPT-3.5

​Bedrock Claude 3.5 Haiku

​Raven

​Gemini 1.5 (coming soon)

​Mistral (coming soon)

​Configuring the model

​Bring Your Own Model (BYOM)

​Overview

​API endpoint

​Request format

​Response format

​Streaming support (optional)

​Authentication

​Sample implementation (Python / Flask)

​Final checklist

GPT-4o Mini

GPT-4o

GPT-4

GPT-3.5 (0125)

GPT-3.5

Bedrock Claude 3.5 Haiku

Raven

Gemini 1.5 (coming soon)

Mistral (coming soon)

Configuring the model

Bring Your Own Model (BYOM)

Overview

API endpoint

Request format

Response format

Streaming support (optional)

Authentication

Sample implementation (Python / Flask)

Final checklist