All Models

Voxtral-4B-TTS-2603

Voxtral-4B-TTS-2603

Audio

Voxtral-4B-TTS-2603 (Mistral AI)

Lightweight, high-quality text-to-speech (TTS) model built for fast, natural voice generation.

  • Small but Capable. A 4B parameter model designed to deliver strong speech quality while staying cheap and fast to run. Good balance between latency, cost, and output quality.

  • Natural Speech Output. Produces clear, expressive audio with good pacing and pronunciation, suitable for real-world applications like assistants, narration, and UI voice.

  • Low Latency. Optimized for fast inference, making it practical for real-time or near-real-time use cases.

  • Simple TTS Pipeline. Takes text as input and outputs speech directly - no complex multi-stage pipeline required.

  • Production Friendly. Lightweight enough to deploy on modest hardware while still delivering consistent, usable voice output.

Why pick it for Norman AI?

Voxtral-4B-TTS is a solid default for adding voice to your product. It’s fast, cheap, and good enough for most use cases - from voice assistants to reading out responses. Use it when you need reliable TTS without overengineering or heavy infrastructure.

response = await norman.invoke(
    {
        "model_name": "voxtral-4b-tts-2603",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "A female speaker delivers a slightly expressive and animated speech with a high-pitched voice in a clear audio environment."
            },
            {
                "display_title": "Text",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)
response = await norman.invoke(
    {
        "model_name": "voxtral-4b-tts-2603",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "A female speaker delivers a slightly expressive and animated speech with a high-pitched voice in a clear audio environment."
            },
            {
                "display_title": "Text",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)
response = await norman.invoke(
    {
        "model_name": "voxtral-4b-tts-2603",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "A female speaker delivers a slightly expressive and animated speech with a high-pitched voice in a clear audio environment."
            },
            {
                "display_title": "Text",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)
response = await norman.invoke(
    {
        "model_name": "voxtral-4b-tts-2603",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "A female speaker delivers a slightly expressive and animated speech with a high-pitched voice in a clear audio environment."
            },
            {
                "display_title": "Text",
                "data": "/Users/alice/Desktop/sample_input.aac"
            }
        ]
    }
)

·

©

2026