All Models

phi-2

phi-2

Text

Phi-2 (2.7 B params, MIT)

Pocket-sized transformer that thinks like a bigger model.

  • Spec sheet. 32-layer decoder-only network, 2 048-token window, trained on 1.4 T “textbook-quality” tokens of filtered web + synthetic data.

  • Hits above its size. Scores rival many 7-13 B models on reasoning and code tasks, topping the charts for sub-13 B LLMs.

  • Runs light. FP16 needs ~5.2 GB VRAM; 4-bit quant fits in ~1.3 GB—good for laptops, edge GPUs, even some phones.

  • Base model, no RLHF. Works with plain QA / chat / code prompts, but expect “textbook” verbosity until you fine-tune.

  • Plug-and-play. Load with transformers >= 4.37, or drop into vLLM, llama.cpp (GGUF), Ollama, etc.-from_pretrained("microsoft/phi-2") and go.

Why pick it for Norman AI?

MIT license, transparent training logs, and a sub-6 GB footprint let us spin up cheap prototypes, edge demos, or tenant-level fine-tunes without touching our heavier Llama tiers.


messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "qwen3-4b",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)