Voxtral-4B-TTS-2603 - Norman SDK

All Models

Voxtral-4B-TTS-2603

Audio

Voxtral-4B-TTS-2603 (Mistral AI)

Lightweight, high-quality text-to-speech (TTS) model built for fast, natural voice generation.

Small but Capable. A 4B parameter model designed to deliver strong speech quality while staying cheap and fast to run. Good balance between latency, cost, and output quality.
Natural Speech Output. Produces clear, expressive audio with good pacing and pronunciation, suitable for real-world applications like assistants, narration, and UI voice.
Low Latency. Optimized for fast inference, making it practical for real-time or near-real-time use cases.
Simple TTS Pipeline. Takes text as input and outputs speech directly - no complex multi-stage pipeline required.
Production Friendly. Lightweight enough to deploy on modest hardware while still delivering consistent, usable voice output.

Why pick it for Norman AI?

Voxtral-4B-TTS is a solid default for adding voice to your product. It’s fast, cheap, and good enough for most use cases - from voice assistants to reading out responses. Use it when you need reliable TTS without overengineering or heavy infrastructure.

response = await norman.invoke(
    {
        "model_name": "voxtral-4b-tts-2603",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "A female speaker delivers a slightly expressive and animated speech with a high-pitched voice in a clear audio environment."
            },
            {
                "display_title": "Text",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)

response = await norman.invoke(
    {
        "model_name": "voxtral-4b-tts-2603",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "A female speaker delivers a slightly expressive and animated speech with a high-pitched voice in a clear audio environment."
            },
            {
                "display_title": "Text",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)

response = await norman.invoke(
    {
        "model_name": "voxtral-4b-tts-2603",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "A female speaker delivers a slightly expressive and animated speech with a high-pitched voice in a clear audio environment."
            },
            {
                "display_title": "Text",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)

response = await norman.invoke(
    {
        "model_name": "voxtral-4b-tts-2603",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "A female speaker delivers a slightly expressive and animated speech with a high-pitched voice in a clear audio environment."
            },
            {
                "display_title": "Text",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)

2026