All Models

Image
Stable Diffusion v2-base (≈ 1 B params, CreativeML Open RAIL++-M)
Latent-diffusion upgrade that trades the old CLIP-L text encoder for a beefier OpenCLIP-ViT/H and cleaner training data.
Sharper prompt understanding. New OpenCLIP-H embedding makes prompts more literal and gives you better edge detail than v1.5—just note that old prompt hacks won’t map 1-to-1.
512 × 512 by default, scales up. The base checkpoint is trained at 512×512; a 768×768 “v-pred” sibling is finetuned on the same weights, with no extra parameters.
Same lightweight U-Net. Roughly the same ~860 M U-Net params as v1.5, so hardware needs don’t spike.
Runs on everyday GPUs. FP16 fits in about 6 GB VRAM; 4-bit quant or CPU inference can drop that under 3 GB.
Cleaned-up dataset. Trained from scratch on LAION-5B after heavy NSFW + aesthetic filtering, reducing nasty artefacts out-of-the-box.
Plug-and-play tooling. One-liner with diffusers, and fully supported in Automatic1111, ComfyUI, SD.Next, etc.—same workflow as any SD-1.x model.
Why pick it for Norman AI?
SD v2-base gives us cleaner images, better prompt control, and “fits-on-a-laptop” VRAM—all without touching our existing diffusion pipeline. Use it for quick hero art, social banners, or on-device generative features without re-architecting the stack.