> ## Documentation Index
> Fetch the complete documentation index at: https://docs.poly.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice

> Configure TTS providers like ElevenLabs and Cartesia programmatically using voice classes.

**This page requires Python familiarity.** It covers programmatic voice configuration from Python functions.

The PolyAI platform supports flexible voice selection for external providers such as ElevenLabs, Cartesia, Rime, PlayHT, Minimax, Hume, and Google TTS.

## Provider classes

When picking models, adjusting stability, or accessing third-party providers – use provider-specific TTSVoice classes.

### Example: ElevenLabs

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from polyai.voice import ElevenLabsVoice

conv.set_voice(
    ElevenLabsVoice(
        provider_voice_id="gDnGxUcsitTxRiGHr904",
        model_id="eleven_turbo_v2_5",
        stability=1.0,          # Recommended starting point (Robust); eleven_v3 only supports 0.0, 0.5, 1.0
        similarity_boost=0.7,
        speed=1.0,              # Optional: 0.7–1.2, adjusts speech rate
    )
)
```

Available ElevenLabs model IDs: `eleven_monolingual_v1`, `eleven_multilingual_v1`, `eleven_turbo_v2`, `eleven_turbo_v2_5`, `eleven_flash_v2_5`, and `eleven_v3`. The default is `eleven_turbo_v2_5`. See [ElevenLabs](https://elevenlabs.io/docs) for details on each model.

<Warning>
  **`eleven_v3` limitations:**

  * **Stability:** The `eleven_v3` model only supports discrete `stability` values: `0.0` (Creative), `0.5` (Natural), and `1.0` (Robust). Values between these are not supported and may produce unexpected results. This differs from earlier models where `stability` accepts a continuous range.
  * **Streaming latency:** Do not set `optimize_streaming_latency` when using `eleven_v3` – this parameter is not supported by the v3 model and will cause an error.
</Warning>

### Example: Cartesia

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from polyai.voice import CartesiaVoice, Emotion, EmotionKind, EmotionIntensity

conv.set_voice(
    CartesiaVoice(
        provider_voice_id="a1b2c3d4",
        speed=0.0,  # -1.0 (slowest) to 1.0 (fastest)
        emotions=[
            Emotion(EmotionKind.POSITIVITY, EmotionIntensity.HIGH)
        ],
        model_id="sonic-3"  # or "sonic-3.5", "sonic-preview", or any Cartesia-compatible identifier e.g. "sonic-3-2025-10-27"
    )
)
```

<Note>Some Cartesia voices are faster than expected at the default speed. Test your chosen voice at `speed=0.0` before deploying, and adjust toward `-1.0` if the output is too fast.</Note>

**Emotion options (legacy models):**

* `EmotionKind`: `ANGER`, `POSITIVITY`, `SURPRISE`
* `EmotionIntensity`: `LOWEST`, `LOW`, `HIGH`, `HIGHEST`

**Sonic 3 and Sonic 3.5 parameters**: When using a Sonic 3 or Sonic 3.5 model ID, the following additional parameters are supported:

* `volume` (float, optional) – controls output volume (e.g. 0.5–2.0).
* `emotion` (str, optional) – emotion string (e.g. `"happy"`).
* `language` (str, optional) – language code (e.g. `"en"`).

### Example: Rime

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from polyai.voice import RimeVoice

conv.set_voice(
    RimeVoice(
        provider_voice_id="voice_id",
        speech_alpha=1.0,  # <1.0 faster, >1.0 slower
        model_id="mistv2"  # or "mist"
    )
)
```

### Example: Minimax

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from polyai.voice import MinimaxVoice

conv.set_voice(
    MinimaxVoice(
        model_id="speech-02-hd",  # or speech-02-turbo, speech-01-hd, speech-01-turbo
        voice_id="voice_id",
        speed=1.0,      # 0.5-2.0
        vol=1.0,        # 0-10
        pitch=0,        # -12 to 12
        emotion="happy" # happy, sad, angry, fearful, disgusted, surprised, neutral
    )
)
```

### Example: Hume

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from polyai.voice import HumeVoice

conv.set_voice(
    HumeVoice(
        provider_voice_id="voice_uuid_or_name",
        voice_description="patient, empathetic counselor",  # Optional
        version="2",        # "1" for octave-1, "2" for octave-2
        instant_mode=False, # Ultra-low latency mode
        provider="HUME_AI"  # "CUSTOM_VOICE" or "HUME_AI"
    )
)
```

### Example: Google TTS

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from polyai.voice import GoogleVoice

conv.set_voice(
    GoogleVoice(
        provider_voice_id="ja-JP-Neural2-B",
        gender="male"  # "male", "female", or "neutral"
    )
)
```

### Example: Custom provider

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from polyai.voice import CustomVoice

conv.set_voice(
    CustomVoice(
        provider="MY_PROVIDER",
        provider_voice_id="voice_id",
        custom_param="value"  # Any additional kwargs
    )
)
```

## Voice randomization

Use `VoiceWeighting` to randomly select a voice based on weighted probabilities:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from polyai.voice import VoiceWeighting, ElevenLabsVoice

conv.randomize_voice([
    VoiceWeighting(
        voice=ElevenLabsVoice(provider_voice_id="voice1"),
        weight=0.7
    ),
    VoiceWeighting(
        voice=ElevenLabsVoice(provider_voice_id="voice2"),
        weight=0.3
    ),
])
```

* Weights must sum to 1.0.
* Voices without explicit weights share the remaining probability equally.

## Cache behavior

* Changing `model_id` does not automatically invalidate cached audio.
* To reset [cached audio](/voice-channel/audio-library):
  * Go to **Voice > Audio library** and delete existing cache entries.
  * Or, create a new voice entry with a different voice ID.

<Tip>
  Prepend the model ID to the voice ID (e.g. `eleven_turbo_v2_5/a1b2c3...`) to isolate cache entries per model. This is the most reliable way to ensure the correct model is used after a switch.
</Tip>

## Language codes

When configuring a voice, make sure the language code in the `provider_voice_id` matches your deployment's locale. An incorrect language code (e.g. `en-GB` instead of `en-IE`) can cause the TTS provider to render a different accent or voice than expected, even when the correct voice ID is set.

## Additional options

* **stability** – controls tone variability across runs (ElevenLabs).
* **speed** – adjusts speech rate (ElevenLabs: `0.7`–`1.2`; PlayHT: `0.1`–`5.0`; other providers may differ).
* **randomize\_voice()** – supports external providers for weighted selection.