All Models

Audio
Distil-Whisper small.en (166 M params, MIT)
Lean English-ASR that keeps Whisper-level accuracy on a phone-friendly footprint.
6× faster, 49 % smaller. Distilled from Whisper-small: same encoder, trimmed decoder → within 1 % WER of the teacher on tough test sets, yet needs a fraction of compute.
Mini spec sheet. 4-layer decoder, 30 s audio window, English-only. Checkpoint ≈ 350 MB FP16; 4-bit quant drops below 100 MB, so real-time runs on CPUs or 1 GB GPUs.
Accuracy in numbers. Short-form WER 12.1 %, long-form 12.8 %—just a few points behind Whisper-large-v3 while decoding 5-6× faster.
Chunk + batch friendly. Built-in 15 s chunking and batching make hour-long transcripts 9× quicker than Whisper’s original loop.
Plugs wherever Whisper does. Supported in transformers ≥ 4.35, whisper.cpp, and as a speculative-decoding helper for bigger Whisper models—drop-in swap of the model ID is all it takes.
Why pick it for Norman AI?
This gives us near-Whisper accuracy for English calls, demos, or edge devices without spinning up a beefy GPU. Use it as a standalone ASR tier or as an “assistant” model to halve latency for our existing Whisper pipelines—same API, lower bill.