Skip to main content
The Model section in Agent Settings lets you choose the Large Language Model (LLM) that powers your agent — or plug in your own through a custom endpoint. Below is an overview of all models currently available in Agent Studio.

PolyAI Models

Our default, proprietary models.

PolyAI Raven V2

A production-hardened PolyAI model optimised for real-time voice interactions and high retrieval precision.

PolyAI Raven V3

The latest Raven model with improved grounding, paraphrasing, and robustness for enterprise voice use cases.

OpenAI Models

GPT-5

The newest general-purpose model with strong reasoning and conversational ability. Best for high-quality interactions requiring nuance.

GPT-5 chat

Optimised for extended dialogue and conversational stability.

GPT-5 mini

A lighter version of GPT-5, offering lower latency and reduced cost for mid-complexity use cases.

GPT-5 nano

A highly efficient variant suitable for simple tasks and fast-response workloads.

GPT-4o

A powerful, versatile model balancing reasoning, speed, and cost.

GPT-4o mini

A smaller, faster version ideal for everyday queries and high-volume deployments.

GPT-4.1

A refined GPT-4 generation with strong reasoning and improved performance across tasks.

GPT-4.1 mini

A cost-effective, latency-focused variant for lighter workloads.

GPT-4.1 nano

The most lightweight option in the GPT-4.1 family, designed for minimal compute and high throughput.

Amazon Bedrock Models

Bedrock Claude 3.5 Haiku

A fast, lightweight Claude variant suitable for simple, predictable tasks with strong safety alignment.

Bedrock Nova Micro

Amazon’s compact LLM optimised for efficiency while maintaining strong general-purpose performance.

Configuring the model

llm-use
  1. Open Agent Settings → Large Language Model.
  2. Select the desired model from the dropdown.
  3. Click Save to apply your changes.
For more details on each provider, see:

Bring Your Own Model (BYOM)

PolyAI supports bring-your-own-model (BYOM) via a simple API integration. If you run your own LLM, expose an endpoint that follows the OpenAI chat/completions schema and PolyAI will treat it like any other provider.

Overview

  1. Expose an API endpoint that accepts/returns data in the OpenAI chat/completions format.
  2. Provide authentication — PolyAI can send either an x-api-key header or a Bearer token.
  3. (Optional) Support streaming responses using stream: true.

API endpoint

Request format

    {
      "model": "your-model-id",
      "messages": [
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "What's the weather today?" }
      ],
      "temperature": 0.7,
      "top_p": 1.0,
      "stream": false
    }
You might receive extra OpenAI-style fields such as frequency_penalty, presence_penalty, etc.

Response format

    {
      "id": "chatcmpl-abc123",
      "object": "chat.completion",
      "created": 1712345678,
      "model": "your-model-id",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "It’s sunny today in London."
          },
          "finish_reason": "stop"
        }
      ]
    }

Streaming support (optional)

If stream is true, send Server-Sent Events (SSE) mirroring OpenAI’s format:
    data: {
      "id": "...",
      "object": "chat.completion.chunk",
      "choices": [{
        "delta": { "content": "Hello" },
        "index": 0,
        "finish_reason": null
      }]
    }

    data: {
      "choices": [{
        "delta": {},
        "index": 0,
        "finish_reason": "stop"
      }]
    }

    data: [DONE]

Authentication

MethodHeader sent by PolyAI
API Keyx-api-key: YOUR_API_KEY
BearerAuthorization: Bearer YOUR_TOKEN
Configure your server to accept one of the above.

Sample implementation (Python / Flask)

    from flask import Flask, request, jsonify
    import time, uuid

    app = Flask(__name__)

    @app.route('/chat/completions', methods=['POST'])
    def chat_completions():
        data = request.json
        messages = data.get('messages', [])
        user_input = messages[-1]['content'] if messages else ''

        # TODO: insert your model inference here
        reply = f'You said: {user_input}'

        return jsonify({
            'id': f'chatcmpl-{uuid.uuid4().hex}',
            'object': 'chat.completion',
            'created': int(time.time()),
            'model': 'my-llm',
            'choices': [{
                'index': 0,
                'message': { 'role': 'assistant', 'content': reply },
                'finish_reason': 'stop'
            }]
        })

Final checklist

  • Endpoint reachable via POST.
  • Request/response match OpenAI chat/completions schema.
  • Authentication header configured (API Key or Bearer token).
  • (Optional) Streaming supported if needed.
Send to your PolyAI contact:
  • Endpoint URL
  • Model ID
  • Auth method & credential