All Models

pythia-2.8b

pythia-2.8b

Text

Pythia-2.8B (2.8 B params, Apache-2.0)

EleutherAI’s open, research-first transformer; small enough for one card, good enough to beat OPT/GPT-Neo peers.

  • Spec sheet. 32 layers · 2 560 hidden size · 32 heads · 2 048-token context; trained on The Pile with the GPT-NeoX codebase.

  • Baseline accuracy. Outperforms similar-sized OPT/GPT-Neo: 60.7 HellaSwag, 36.3 ARC-25, 26.8 MMLU (5-shot).

  • Runs light. FP16 weights need ≈ 6 GB VRAM; 8-bit ≈ 3 GB; 4-bit ≈ 1 GB—laptops or single A10G are fine.

  • Tool-ready. Drop-in with transformers (GPT-NeoX class), vLLM, llama.cpp (GGUF), Ollama, etc.—from_pretrained("EleutherAI/pythia-2.8b") and go.

  • Hack-friendly. 154 training checkpoints expose the model’s entire learning trajectory—great for interpretability or custom fine-tunes.

Why pick it for Norman AI?

Apache license, tiny VRAM needs, and a full training timeline make Pythia-2.8B the perfect sandbox for fast experiments—whether you’re prototyping edge demos, benchmarking quantization pipelines, or digging into model internals without burning GPUs.


messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "qwen3-4b",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)