All Models

Text
Phi-2 (2.7 B params, MIT)
Pocket-sized transformer that thinks like a bigger model.
Spec sheet. 32-layer decoder-only network, 2 048-token window, trained on 1.4 T “textbook-quality” tokens of filtered web + synthetic data.
Hits above its size. Scores rival many 7-13 B models on reasoning and code tasks, topping the charts for sub-13 B LLMs.
Runs light. FP16 needs ~5.2 GB VRAM; 4-bit quant fits in ~1.3 GB—good for laptops, edge GPUs, even some phones.
Base model, no RLHF. Works with plain QA / chat / code prompts, but expect “textbook” verbosity until you fine-tune.
Plug-and-play. Load with transformers >= 4.37, or drop into vLLM, llama.cpp (GGUF), Ollama, etc.-from_pretrained("microsoft/phi-2") and go.
Why pick it for Norman AI?
MIT license, transparent training logs, and a sub-6 GB footprint let us spin up cheap prototypes, edge demos, or tenant-level fine-tunes without touching our heavier Llama tiers.