Name: Stable Diffusion XL
Author: Stability AI

Stability AI's flagship open-source text-to-image generation model. Features 3.5B parameter base model with 6.6B parameter refiner in ensemble pipeline. Native 1024x1024 resolution (2x larger than SD 1.5) with improved generation for limbs, text, faces, and overall image quality. Uses dual CLIP networks (CLIP1 + CLIP2) for superior semantic understanding vs single CLIP. Achieves 89% prompt adherence vs SD 1.5's 71%. Supports image-to-image, inpainting, and outpainting workflows. Runs on consumer hardware (RTX 3060+ with 8GB VRAM).

Stable Diffusion XL

Strengths

Caveats

Capabilities

Resources

Reviews

Comments