All Models

Audio
Whisper Medium (769 M params, Apache-2.0)
Mid-sized ASR that nails ~3 % WER and runs on a 3 GB GPU.
Spec sheet. 769 M-param encoder-decoder Transformer, 30 s audio window, trained on 680 k h of speech across 99 languages.
Real-world accuracy. Self-reported 2.9 % WER on LibriSpeech clean and 5.9 % on other—state-of-the-art for its size.
Runs light. FP16 weights ≈1.5 GB; end-to-end inference is happy on cards with ~3 GB VRAM—no H100 needed.
Multilingual + translate. Auto-detects 99 languages and can spit out either same-language transcripts or English translations.
Why pick it for Norman AI?
Whisper Medium gives us near-large accuracy, full multilingual coverage, and Apache freedom—all in a package light enough for edge GPUs or co-located micro-services. Perfect for call transcripts, video captions, or voice-note features without new hardware or license headaches.