Skip to main content
The Model section in Agent Settings lets you choose the Large Language Model (LLM) that powers your agent — or plug in your own through a custom endpoint. Below is an overview of all models currently available in Agent Studio.

OpenAI Models

GPT-5

The newest general-purpose model with strong reasoning and conversational ability. Best for high-quality interactions requiring nuance.

GPT-5 chat

Optimised for extended dialogue and conversational stability.

GPT-5 mini

A lighter version of GPT-5, offering lower latency and reduced cost for mid-complexity use cases.

GPT-5 nano

A highly efficient variant suitable for simple tasks and fast-response workloads.

GPT-4o

A powerful, versatile model balancing reasoning, speed, and cost.

GPT-4o mini

A smaller, faster version ideal for everyday queries and high-volume deployments.

GPT-4.1

A refined GPT-4 generation with strong reasoning and improved performance across tasks.

GPT-4.1 mini

A cost-effective, latency-focused variant for lighter workloads.

GPT-4.1 nano

The most lightweight option in the GPT-4.1 family, designed for minimal compute and high throughput.

PolyAI Models

PolyAI Raven V2

A production-hardened PolyAI model optimised for real-time voice interactions and high retrieval precision.

PolyAI Raven V3

The latest Raven model with improved grounding, paraphrasing, and robustness for enterprise voice use cases.

Amazon Bedrock Models

Bedrock Claude 3.5 Haiku

A fast, lightweight Claude variant suitable for simple, predictable tasks with strong safety alignment.

Bedrock Nova Micro

Amazon’s compact LLM optimised for efficiency while maintaining strong general-purpose performance.

Configuring the model

llm-use
  1. Open Agent Settings → Large Language Model.
  2. Select the desired model from the dropdown.
  3. Click Save to apply your changes.
For more details on each provider, see:

Bring Your Own Model (BYOM)

PolyAI supports bring-your-own-model (BYOM) via a simple API integration. If you run your own LLM, expose an endpoint that follows the OpenAI chat/completions schema and PolyAI will treat it like any other provider.

Overview

  1. Expose an API endpoint that accepts/returns data in the OpenAI chat/completions format.
  2. Provide authentication — PolyAI can send either an x-api-key header or a Bearer token.
  3. (Optional) Support streaming responses using stream: true.

API endpoint

Request format

    {
      "model": "your-model-id",
      "messages": [
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "What's the weather today?" }
      ],
      "temperature": 0.7,
      "top_p": 1.0,
      "stream": false
    }
You might receive extra OpenAI-style fields such as frequency_penalty, presence_penalty, etc.

Response format

    {
      "id": "chatcmpl-abc123",
      "object": "chat.completion",
      "created": 1712345678,
      "model": "your-model-id",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "It’s sunny today in London."
          },
          "finish_reason": "stop"
        }
      ]
    }

Streaming support (optional)

If stream is true, send Server-Sent Events (SSE) mirroring OpenAI’s format:
    data: {
      "id": "...",
      "object": "chat.completion.chunk",
      "choices": [{
        "delta": { "content": "Hello" },
        "index": 0,
        "finish_reason": null
      }]
    }

    data: {
      "choices": [{
        "delta": {},
        "index": 0,
        "finish_reason": "stop"
      }]
    }

    data: [DONE]

Authentication

MethodHeader sent by PolyAI
API Keyx-api-key: YOUR_API_KEY
BearerAuthorization: Bearer YOUR_TOKEN
Configure your server to accept one of the above.

Sample implementation (Python / Flask)

    from flask import Flask, request, jsonify
    import time, uuid

    app = Flask(__name__)

    @app.route('/chat/completions', methods=['POST'])
    def chat_completions():
        data = request.json
        messages = data.get('messages', [])
        user_input = messages[-1]['content'] if messages else ''

        # TODO: insert your model inference here
        reply = f'You said: {user_input}'

        return jsonify({
            'id': f'chatcmpl-{uuid.uuid4().hex}',
            'object': 'chat.completion',
            'created': int(time.time()),
            'model': 'my-llm',
            'choices': [{
                'index': 0,
                'message': { 'role': 'assistant', 'content': reply },
                'finish_reason': 'stop'
            }]
        })

Final checklist

  • Endpoint reachable via POST.
  • Request/response match OpenAI chat/completions schema.
  • Authentication header configured (API Key or Bearer token).
  • (Optional) Streaming supported if needed.
Send to your PolyAI contact:
  • Endpoint URL
  • Model ID
  • Auth method & credential