Tool DiscoveryTool Discovery
Local AIIntermediate45 min to complete16 min read

How to Run Stable Diffusion Locally: ComfyUI and AUTOMATIC1111 Guide

Run Stable Diffusion locally with ComfyUI or AUTOMATIC1111. Complete setup guide for Windows, Linux, and macOS with NVIDIA, AMD, and Apple Silicon GPUs.

A
By Amara
|Published 17 March 2026
ComfyUI node-based workflow generating an AI image locally

Stable Diffusion is an open-source image generation model you can run on your own GPU without sending prompts or images to an external server. Unlike Midjourney or DALL-E 3, local Stable Diffusion has no usage fees, no content filters, and no rate limits. You can generate thousands of images per day at zero marginal cost once the hardware is set up.

Two frontends dominate local Stable Diffusion setups in 2026: **ComfyUI** and **AUTOMATIC1111 (A1111)**. ComfyUI uses a node-based workflow canvas similar to visual programming tools. A1111 uses a traditional web form interface that is easier for beginners but less flexible. This guide covers both so you can choose based on your workflow.

Hardware requirements vary by model size. A GPU with 6 GB VRAM runs Stable Diffusion XL comfortably. 4 GB VRAM works with SD 1.5 models using optimisations. CPU-only generation is possible but takes 5-10 minutes per image compared to 2-15 seconds on a mid-range GPU.

Prerequisites

  • NVIDIA GPU with 4+ GB VRAM (recommended), AMD GPU with ROCm support, or Apple Silicon Mac
  • Python 3.10 or 3.11 installed (3.12 has compatibility issues with some extensions)
  • Git installed
  • 20-30 GB free disk space (models are 2-7 GB each)
  • NVIDIA CUDA Toolkit 11.8+ installed (Windows/Linux NVIDIA GPU setups)
🖥️

Need a VPS?

Run this on a Contabo Cloud VPS 40 with GPU starting at €32.95/mo. Reliable Linux VPS with NVMe storage, ideal for self-hosted AI workloads.

Hardware Requirements and GPU Compatibility

Before installing, verify your GPU meets the minimum requirements. Generation speed scales significantly with VRAM.

GPU VRAMCompatible ModelsGeneration SpeedNotes
4 GBSD 1.5, SDXL with medvram10-30 sec/imageEnable `--medvram` flag
6 GBSD 1.5, SDXL, Flux (lite)4-15 sec/imageComfortable for most use
8 GBSDXL, Flux Dev2-8 sec/imageRecommended minimum for SDXL
12 GB+All models including Flux Pro1-4 sec/imageOptimal for production
CPU onlySD 1.5 (slow)5-15 min/imageOnly for testing — not practical

Check Your GPU

# NVIDIA
nvidia-smi

# AMD Linux
rocm-smi

# Apple Silicon (just run the macOS installer — Metal support is built in)
system_profiler SPDisplaysDataType | grep VRAM

Operating System Compatibility

OSNVIDIAAMDApple Silicon
Windows 10/11Full CUDA supportDirectML (slower)N/A
Ubuntu 20.04+Full CUDA supportROCm supportN/A
macOS 12.3+N/AN/AMetal (MPS) support
ℹ️
Note:AMD GPU support on Windows uses DirectML which is slower than CUDA. For AMD GPU users on Windows, performance is roughly 30-50% slower than an equivalent NVIDIA GPU. On Linux with ROCm properly configured, AMD GPUs achieve near-CUDA performance.

Install ComfyUI (Recommended)

ComfyUI is a node-based workflow editor for Stable Diffusion. It has a steeper initial learning curve but is more powerful and resource-efficient than A1111. Most advanced users and professionals use ComfyUI for production workflows.

Step 1: Clone the ComfyUI Repository

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI

Step 2: Create a Python Virtual Environment

python -m venv venv

# Activate on Linux/macOS
source venv/bin/activate

# Activate on Windows
venv\Scripts\activate

Step 3: Install PyTorch

Install the correct PyTorch version for your hardware. Go to pytorch.org/get-started for the exact command for your system, or use one of these:

# NVIDIA GPU (CUDA 12.1)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# AMD GPU (ROCm 6.0 — Linux only)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

# Apple Silicon / CPU
pip install torch torchvision torchaudio

Step 4: Install ComfyUI Dependencies

pip install -r requirements.txt

Step 5: Download a Model

ComfyUI reads models from the `models/checkpoints/` directory. Download a base model:

# Create checkpoints directory
mkdir -p models/checkpoints

# Download SDXL base model (~6.5 GB) -- replace with your preferred download method
# Option 1: Using wget
wget -O models/checkpoints/sd_xl_base_1.0.safetensors \
  https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

# Option 2: Using the Hugging Face CLI (pip install huggingface_hub)
huggingface-cli download stabilityai/stable-diffusion-xl-base-1.0 \
  sd_xl_base_1.0.safetensors --local-dir models/checkpoints/

Step 6: Start ComfyUI

python main.py

Open your browser at `http://127.0.0.1:8188`. You should see the ComfyUI canvas with a default text-to-image workflow already loaded.

💡
Tip:On low VRAM systems (4-6 GB), start with: `python main.py --lowvram`. This reduces VRAM usage at the cost of slightly slower generation.

Install AUTOMATIC1111 (WebUI)

AUTOMATIC1111 (A1111) provides a traditional web form interface for Stable Diffusion. It has a large extension ecosystem and is easier for beginners who prefer settings panels over node graphs.

Step 1: Clone the Repository

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui

Step 2: Place Your Model

Copy or move your `.safetensors` or `.ckpt` model file into:

stable-diffusion-webui/models/Stable-diffusion/

Step 3: Run the WebUI

**Linux/macOS:**

./webui.sh

**Windows:**

batch
webui-user.bat

The first launch takes 5-15 minutes to download dependencies and set up the virtual environment automatically. Subsequent launches are much faster.

Once started, open: `http://127.0.0.1:7860`

Step 4: Generate Your First Image

1. Type a prompt in the "Positive prompt" box: `a photorealistic cat sitting on a wooden table, natural light, 4k` 2. Type a negative prompt: `blurry, low quality, cartoon, watermark` 3. Set Width and Height to 1024x1024 for SDXL models 4. Click **Generate**

SettingSDXL RecommendedSD 1.5 Recommended
Width x Height1024 x 1024512 x 512
Sampling Steps20-3020-30
CFG Scale7-97-9
SamplerDPM++ 2MEuler a
ℹ️
Note:AUTOMATIC1111 has slower startup time than ComfyUI but a more approachable UI for new users. Most advanced features (ControlNet, LoRA fine-tuning, inpainting) are available as extensions installable from the Extensions tab.

Finding and Downloading Models

Stable Diffusion models are checkpoint files (.safetensors) that define the visual style and capabilities of the generated images. The main repositories are Hugging Face and CivitAI.

ModelSizeBest ForDownload
SDXL Base 1.06.5 GBHigh-quality photorealistic images, 1024pxHugging Face
SD 1.52.1 GBFaster generation, more fine-tuned models availableHugging Face
Flux.1 Dev23 GBHighest quality (requires 16+ GB VRAM)Hugging Face
Juggernaut XL6.7 GBPhotorealistic portraits and scenesCivitAI
DreamShaper XL6.6 GBVersatile: photos, art, fantasyCivitAI

Where to Put Downloaded Models

For ComfyUI, place models in these directories by type:

ComfyUI/
  models/
    checkpoints/   ← main model files (.safetensors)
    loras/         ← LoRA fine-tuning weights
    vae/           ← VAE files
    controlnet/    ← ControlNet models
    embeddings/    ← textual inversion files

For A1111:

stable-diffusion-webui/
  models/
    Stable-diffusion/  ← checkpoint files
    Lora/              ← LoRA files
    VAE/               ← VAE files

LoRA Models (Style Modifiers)

LoRA models are small weight files (50-300 MB) that apply a specific style on top of a base model. Popular uses: specific artistic styles, consistent character faces, product photography styles.

In A1111, add a LoRA by including its trigger keyword in your prompt with a weight: ``. In ComfyUI, use the Load LoRA node and connect it between the model loader and the sampler.

💡
Tip:Always download models in `.safetensors` format rather than `.ckpt`. The safetensors format cannot execute arbitrary code during loading, making it the safer choice. Most CivitAI and Hugging Face models now offer safetensors downloads.

Writing Effective Prompts

Stable Diffusion prompts follow a different structure than conversational AI tools. The model responds to weighted keyword phrases, not natural sentences.

Positive Prompt Structure

A good positive prompt layers descriptors from general to specific:

[subject], [medium/style], [lighting], [composition], [quality keywords]

Example for a portrait:

a young woman with red hair, professional headshot photography, soft studio lighting, shallow depth of field, 85mm lens, sharp focus, 8k resolution

Negative Prompt — What to Exclude

Negative prompts tell the model what to avoid. A standard negative prompt for photorealistic images:

blurry, out of focus, low quality, jpeg artifacts, watermark, signature, text, ugly, deformed, bad anatomy, extra limbs, missing fingers, cropped

Prompt Weighting

Both ComfyUI and A1111 support weight adjustments:

SyntaxEffect
`(keyword)`Increases weight by 1.1x
`(keyword:1.5)`Sets weight to 1.5x
`[keyword]`Decreases weight by 0.9x
`(keyword:0.5)`Sets weight to 0.5x (deprioritise)

Example: `(photorealistic:1.3), (sharp focus:1.2), woman, garden, [cartoon:0.3]`

💡
Tip:Do not over-weight keywords (above 1.5). High weights cause visual artifacts. Start with default weights and only boost the 2-3 most critical elements. Test with different seeds to find the right balance before adjusting weights.

Troubleshooting

CUDA out of memory error when generating

Cause: The model or image resolution requires more VRAM than available

Fix: Reduce image resolution to 768x768 (SDXL) or 512x512 (SD 1.5). Start ComfyUI with `--lowvram` or `--medvram` flags. In A1111, enable Settings > Optimizations > medvram. Close other GPU-intensive applications before generating.

Black image output with no error message

Cause: Wrong VAE file, or the base model requires a specific VAE that is not loaded

Fix: Download the correct VAE for your model. For SDXL: `sdxl_vae.safetensors` from Stability AI on Hugging Face. In A1111: Settings > Stable Diffusion > SD VAE. In ComfyUI, add a VAE Loader node and connect it to the KSampler.

Very slow generation (5+ minutes per image) on GPU

Cause: PyTorch is using CPU instead of GPU due to a driver or installation issue

Fix: Verify PyTorch sees the GPU: `python -c "import torch; print(torch.cuda.is_available())"`. Should print `True`. If `False`, reinstall PyTorch with the correct CUDA version for your GPU. Check `nvidia-smi` to confirm the driver is working.

ComfyUI shows "model not found" when loading a workflow

Cause: The workflow references a model filename that does not match what you have in models/checkpoints/

Fix: Open the checkpoint node in ComfyUI and click the model selector dropdown. It shows all models in the checkpoints folder. Select the model you have downloaded. The workflow filename must exactly match the file on disk.

A1111 webui.sh fails with Python version errors

Cause: Python 3.12 has compatibility issues with some A1111 dependencies

Fix: Use Python 3.10 or 3.11. Install pyenv to manage Python versions: `pyenv install 3.10.14 && pyenv local 3.10.14`. Then re-run webui.sh.

Alternatives to Consider

ToolTypePriceBest For
MidjourneyCloud SaaS$10/mo (200 images) to $60/mo (unlimited)High quality without setup — best results for artistic and commercial images with minimal prompting
FooocusSelf-hosted (Python)Free (open source)Simplest local Stable Diffusion setup — minimal UI, auto-optimises settings, no manual configuration
InvokeAISelf-hosted (Docker/Python)Free (open source)Professional local workflow with a polished UI, built-in canvas, and team features
ReplicateCloud API$0.0046 per image (SDXL)Developers who want Stable Diffusion via API without managing local infrastructure

Frequently Asked Questions

What GPU do I need to run Stable Diffusion locally?

The minimum practical GPU for Stable Diffusion is one with 6 GB VRAM. An NVIDIA RTX 3060 (12 GB VRAM) or RTX 4060 (8 GB VRAM) handles SDXL at 1024px resolution in 3-8 seconds per image and costs under $300.

For 4 GB VRAM GPUs (GTX 1650, RTX 3050), SD 1.5 models work with the `--medvram` flag. SDXL is possible but slow. AMD GPUs with 8+ GB VRAM work on Linux with ROCm. Apple Silicon Macs use the Metal backend and achieve good speeds even without a dedicated GPU.

What is the difference between ComfyUI and AUTOMATIC1111?

AUTOMATIC1111 uses a traditional web form interface with sliders, dropdowns, and buttons. It is easier to learn for new users and has a massive extension library built up since 2022.

ComfyUI uses a node-based canvas where you connect nodes representing each step of the generation pipeline. It is harder to learn initially but much more flexible. Advanced techniques like multi-pass generation, ControlNet workflows, and custom pipelines are easier to build and share in ComfyUI. Most professional Stable Diffusion users have moved to ComfyUI.

Can I run Stable Diffusion without a GPU?

Yes, but it is extremely slow. CPU-only generation on Stable Diffusion XL takes 5-15 minutes per image on a modern desktop CPU. SD 1.5 takes 2-5 minutes per image.

For CPU users, use Fooocus which is optimised for low-resource setups. Apple Silicon Macs are the exception — they use the Metal Performance Shaders (MPS) backend which gives GPU-like speeds through the unified memory architecture.

Where do I download Stable Diffusion models?

The two main sources are Hugging Face and CivitAI. Hugging Face hosts official base models (SDXL Base 1.0, SD 1.5, Flux.1). CivitAI has thousands of community fine-tuned models for specific styles, characters, and use cases.

Always download `.safetensors` format rather than `.ckpt`. Safetensors cannot execute code during loading, making it safer. Check model ratings and comments on CivitAI before downloading — community feedback indicates whether a model actually works as advertised.

What is a LoRA model in Stable Diffusion?

LoRA (Low-Rank Adaptation) models are small weight files (50-300 MB) that modify a base model to add a specific style, subject, or concept without replacing the entire checkpoint. You load a LoRA alongside your base model and control its influence with a weight value between 0 and 1.

Common uses: consistent character faces across multiple images, specific artistic styles (watercolor, oil painting, anime), product photography styles, and architectural rendering styles. LoRAs are additive — you can combine multiple LoRAs in one generation, though stacking too many (3+) often causes visual artifacts.

How do I use Stable Diffusion for inpainting (editing parts of an image)?

Both ComfyUI and A1111 support inpainting. In A1111, switch to the img2img tab, upload your image, click "Inpaint", paint over the area you want to change, describe the replacement in the prompt, and click Generate.

In ComfyUI, use the InpaintModelConditioning node with a mask. Draw the mask in the Load Image node's mask editor or use the WAS Suite's mask tools. The model fills only the masked region while preserving the rest of the image.

Use a denoising strength of 0.7-0.85 for inpainting — lower values preserve more of the original, higher values allow more creative deviation.

Related Guides