NVIDIA-Nemotron-3-Nano-30B-A3B (NVFP4)

All Models

Text

NVIDIA-Nemotron-3-Nano-30B-A3B (NVFP4)

Ultra-fast, hybrid text model designed for low-latency agentic tasks and high-throughput edge deployment.

Hybrid Efficiency. A 30B parameter "mixture" model that only activates 3B parameters per token. It uses a Mamba-2 (SSM) and Transformer hybrid architecture to provide 30B-level reasoning at 3B-level speeds.
Massive 1M Context. Engineered to ingest up to 1,000,000 tokens. It maintains high retrieval accuracy ("Needle In A Haystack") over massive documents where standard small models usually fail.
Native 4-bit Precision. Trained from scratch in NVFP4. This allows the model to run with a tiny memory footprint on NVIDIA hardware while retaining nearly 100% of its original "dense" accuracy.
Built for Real-Time Agents. Optimized specifically for tool-calling, structured data extraction (JSON), and multi-step reasoning traces. It is significantly more reliable for autonomous tasks than typical 3B–8B models.
Extreme Throughput. Incorporates Multi-Token Prediction (MTP), allowing it to generate text at blistering speeds, making it ideal for real-time chat, live summarization, and high-volume API services.

Why pick it for Norman AI?

Nemotron-3-Nano is the "speed king" for long-context intelligence. It is the perfect choice when you need to process massive amounts of text (like legal archives or entire codebases) instantly, or when building responsive AI agents that need to call tools and "think" without the lag of a heavy model.

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)

2026