Core Concepts
This page introduces the two fundamental ideas behind the Norman platform: AI Model Deployment and AI Model Inference — what they mean in general, and how Norman makes them radically simpler.
1. AI Model Deployment
What is AI model deployment?
AI model deployment is the process of taking a trained model (e.g., a .pt, .safetensors, .bin, or custom asset) and making it usable in the real world.
Typical deployment includes:
packaging model files
provisioning compute resources
handling Python environments and dependencies
exposing an API endpoint
managing versions
scaling the model for multiple users
handling retries, logging, timeouts, and errors
securing access
storing assets
monitoring usage
In most platforms, deployment is the hardest and most DevOps-heavy part of running AI.
How deployment works in Norman?
Norman automates the entire deployment lifecycle.
When you call:
Norman handles:
Asset ingestion → stores your model files securely
Configuration parsing → inputs, outputs, encodings, metadata
Automatic versioning
Dependency environment setup (per model)
Containerization & preparation
Routing into the Compute service
API availability → instantly invokable
Management & visibility in your Models Library
You never configure servers, build images, or write YAML. You simply upload the model - Norman deploys it.
AI Model Inference
What is AI model inference?
AI inference is the process of running a deployed model on input data to produce an output.
Common examples:
Sending an image → getting a classification
Sending text → receiving a completion
Uploading audio → receiving a transcription
Sending tensors → receiving tensors
Inference must be fast, reliable, reproducible, scalable and traceable.
Most real-world systems require queues, workers, file storage, and output routing.
·
©