All Models

Text
Gemma-2 2B (≈2 B params, Gemma license)
Tiny transformer that stays within 8 k tokens yet delivers solid reasoning.
Spec sheet. Decoder-only, RoPE positional encodings, 8 192-token context window; pre-trained on ~2 T tokens of web, code & math text.
Punchy accuracy for its size. Scores 51.3 MMLU (5-shot), 73.0 HellaSwag (10-shot) and 77.8 PIQA (0-shot) — beating many 3-7 B open models.
Runs on almost any box. Float-16 weights sip ≈3.7 GB VRAM; int-4 quant fits on <2 GB, so laptops or low-end cloud GPUs are fine..
Fast path available. torch.compile can 6× your throughput once two warm-up calls finish.
Tool-ready. Drop-in with transformers, vLLM, llama.cpp, Ollama, or the lightweight local-gemma CLI—just from_pretrained("google/gemma-2-2b") and go.
Why pick it for Norman AI?
Gemma-2 2B gives us open weights, long-context chats, and sub-4 GB footprints—perfect for edge deployments, per-tenant fine-tunes, or a “budget” tier in our inference stack without sacrificing quality.