Local AIIntermediate45 min to complete16 min read

How to Run Stable Diffusion Locally: ComfyUI and AUTOMATIC1111 Guide

Q: What GPU do I need to run Stable Diffusion locally?

Minimum: 6 GB VRAM for comfortable SDXL generation. RTX 3060 (12 GB) or RTX 4060 (8 GB) are the most popular choices. 4 GB VRAM works for SD 1.5 with --medvram flag. Apple Silicon Macs run Stable Diffusion via Metal backend.

Q: What is the difference between ComfyUI and AUTOMATIC1111?

AUTOMATIC1111 has a traditional form-based UI — easier for beginners. ComfyUI uses a node-based canvas — more flexible and powerful. Most professional users prefer ComfyUI. A1111 has more extensions. Both produce identical image quality.

Q: Can I run Stable Diffusion without a GPU?

Yes, but expect 5-15 minutes per image on CPU. Not practical for regular use. Apple Silicon Macs are the exception — they use Metal GPU acceleration through unified memory and generate images in 10-30 seconds.

Q: Where do I download Stable Diffusion models?

Download from Hugging Face (official models: SDXL, SD 1.5, Flux) and CivitAI (community fine-tuned models). Always use .safetensors format — it cannot execute code during loading unlike older .ckpt files. Check CivitAI ratings before downloading.

Q: What is a LoRA model in Stable Diffusion?

LoRA models (50-300 MB) add a specific style or subject to a base model without replacing it. Set influence with a weight 0-1. Used for: consistent characters, art styles, product photos. You can combine multiple LoRAs but stacking 3+ causes artifacts.

Q: How do I use Stable Diffusion for inpainting (editing parts of an image)?

A1111: img2img tab > Inpaint > paint mask > describe change > Generate. ComfyUI: use InpaintModelConditioning node with a mask. Set denoising strength 0.7-0.85 for inpainting. Lower = more original preserved, higher = more creative change.

Run Stable Diffusion locally with ComfyUI or AUTOMATIC1111. Complete setup guide for Windows, Linux, and macOS with NVIDIA, AMD, and Apple Silicon GPUs.

By Amara|Updated 3 June 2026

ComfyUI node-based workflow generating an AI image locally

Stable Diffusion is an open-source image generation model you can run on your own GPU without sending prompts or images to an external server. Unlike Midjourney or DALL-E 3, local Stable Diffusion has no usage fees, no content filters, and no rate limits. You can generate thousands of images per day at zero marginal cost once the hardware is set up.

Two frontends dominate local Stable Diffusion setups in 2026: **ComfyUI** and **AUTOMATIC1111 (A1111)**. ComfyUI uses a node-based workflow canvas similar to visual programming tools. A1111 uses a traditional web form interface that is easier for beginners but less flexible. This guide covers both so you can choose based on your workflow.

Hardware requirements vary by model size. A GPU with 6 GB VRAM runs Stable Diffusion XL comfortably. 4 GB VRAM works with SD 1.5 models using optimisations. CPU-only generation is possible but takes 5-10 minutes per image compared to 2-15 seconds on a mid-range GPU.

Prerequisites

NVIDIA GPU with 4+ GB VRAM (recommended), AMD GPU with ROCm support, or Apple Silicon Mac
Python 3.10 or 3.11 installed (3.12 has compatibility issues with some extensions)
Git installed
20-30 GB free disk space (models are 2-7 GB each)
NVIDIA CUDA Toolkit 11.8+ installed (Windows/Linux NVIDIA GPU setups)

🖥️

Need a VPS?

Run this on a Contabo Cloud VPS 40 with GPU starting at €32.95/mo. Reliable Linux VPS with NVMe storage, ideal for self-hosted AI workloads.

In This Guide

1Hardware Requirements and GPU Compatibility
2Install ComfyUI (Recommended)
3Install AUTOMATIC1111 (WebUI)
4Finding and Downloading Models
5Writing Effective Prompts
6Troubleshooting
7FAQ

Hardware Requirements and GPU Compatibility

Before installing, verify your GPU meets the minimum requirements. Generation speed scales significantly with VRAM.

GPU VRAM	Compatible Models	Generation Speed	Notes
4 GB	SD 1.5, SDXL with medvram	10-30 sec/image	Enable `--medvram` flag
6 GB	SD 1.5, SDXL, Flux (lite)	4-15 sec/image	Comfortable for most use
8 GB	SDXL, Flux Dev	2-8 sec/image	Recommended minimum for SDXL
12 GB+	All models including Flux Pro	1-4 sec/image	Optimal for production
CPU only	SD 1.5 (slow)	5-15 min/image	Only for testing — not practical

Check Your GPU

# NVIDIA
nvidia-smi

# AMD Linux
rocm-smi

# Apple Silicon (just run the macOS installer — Metal support is built in)
system_profiler SPDisplaysDataType | grep VRAM

Operating System Compatibility

OS	NVIDIA	AMD	Apple Silicon
Windows 10/11	Full CUDA support	DirectML (slower)	N/A
Ubuntu 20.04+	Full CUDA support	ROCm support	N/A
macOS 12.3+	N/A	N/A	Metal (MPS) support

ℹ️

Note:AMD GPU support on Windows uses DirectML which is slower than CUDA. For AMD GPU users on Windows, performance is roughly 30-50% slower than an equivalent NVIDIA GPU. On Linux with ROCm properly configured, AMD GPUs achieve near-CUDA performance.

Install ComfyUI (Recommended)

ComfyUI is a node-based workflow editor for Stable Diffusion. It has a steeper initial learning curve but is more powerful and resource-efficient than A1111. Most advanced users and professionals use ComfyUI for production workflows.

Step 1: Clone the ComfyUI Repository

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI

Step 2: Create a Python Virtual Environment

python -m venv venv

# Activate on Linux/macOS
source venv/bin/activate

# Activate on Windows
venv\Scripts\activate

Step 3: Install PyTorch

Install the correct PyTorch version for your hardware. Go to pytorch.org/get-started for the exact command for your system, or use one of these:

# NVIDIA GPU (CUDA 12.1)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# AMD GPU (ROCm 6.0 — Linux only)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

# Apple Silicon / CPU
pip install torch torchvision torchaudio

Step 4: Install ComfyUI Dependencies

pip install -r requirements.txt

Step 5: Download a Model

ComfyUI reads models from the `models/checkpoints/` directory. Download a base model:

# Create checkpoints directory
mkdir -p models/checkpoints

# Download SDXL base model (~6.5 GB) -- replace with your preferred download method
# Option 1: Using wget
wget -O models/checkpoints/sd_xl_base_1.0.safetensors \
  https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

# Option 2: Using the Hugging Face CLI (pip install huggingface_hub)
huggingface-cli download stabilityai/stable-diffusion-xl-base-1.0 \
  sd_xl_base_1.0.safetensors --local-dir models/checkpoints/

Step 6: Start ComfyUI

python main.py

Open your browser at `http://127.0.0.1:8188`. You should see the ComfyUI canvas with a default text-to-image workflow already loaded.

💡

Tip:On low VRAM systems (4-6 GB), start with: `python main.py --lowvram`. This reduces VRAM usage at the cost of slightly slower generation.

Install AUTOMATIC1111 (WebUI)

AUTOMATIC1111 (A1111) provides a traditional web form interface for Stable Diffusion. It has a large extension ecosystem and is easier for beginners who prefer settings panels over node graphs.

Step 1: Clone the Repository

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui

Step 2: Place Your Model

Copy or move your `.safetensors` or `.ckpt` model file into:

stable-diffusion-webui/models/Stable-diffusion/

Step 3: Run the WebUI

**Linux/macOS:**

./webui.sh

**Windows:**

batch

webui-user.bat

The first launch takes 5-15 minutes to download dependencies and set up the virtual environment automatically. Subsequent launches are much faster.

Once started, open: `http://127.0.0.1:7860`

Step 4: Generate Your First Image

1. Type a prompt in the "Positive prompt" box: `a photorealistic cat sitting on a wooden table, natural light, 4k` 2. Type a negative prompt: `blurry, low quality, cartoon, watermark` 3. Set Width and Height to 1024x1024 for SDXL models 4. Click **Generate**

Setting	SDXL Recommended	SD 1.5 Recommended
Width x Height	1024 x 1024	512 x 512
Sampling Steps	20-30	20-30
CFG Scale	7-9	7-9
Sampler	DPM++ 2M	Euler a

ℹ️

Note:AUTOMATIC1111 has slower startup time than ComfyUI but a more approachable UI for new users. Most advanced features (ControlNet, LoRA fine-tuning, inpainting) are available as extensions installable from the Extensions tab.

Finding and Downloading Models

Stable Diffusion models are checkpoint files (.safetensors) that define the visual style and capabilities of the generated images. The main repositories are Hugging Face and CivitAI.

Recommended Base Models

Model	Size	Best For	Download
SDXL Base 1.0	6.5 GB	High-quality photorealistic images, 1024px	Hugging Face
SD 1.5	2.1 GB	Faster generation, more fine-tuned models available	Hugging Face
Flux.1 Dev	23 GB	Highest quality (requires 16+ GB VRAM)	Hugging Face
Juggernaut XL	6.7 GB	Photorealistic portraits and scenes	CivitAI
DreamShaper XL	6.6 GB	Versatile: photos, art, fantasy	CivitAI

Where to Put Downloaded Models

For ComfyUI, place models in these directories by type:

ComfyUI/
  models/
    checkpoints/   ← main model files (.safetensors)
    loras/         ← LoRA fine-tuning weights
    vae/           ← VAE files
    controlnet/    ← ControlNet models
    embeddings/    ← textual inversion files

For A1111:

stable-diffusion-webui/
  models/
    Stable-diffusion/  ← checkpoint files
    Lora/              ← LoRA files
    VAE/               ← VAE files

LoRA Models (Style Modifiers)

LoRA models are small weight files (50-300 MB) that apply a specific style on top of a base model. Popular uses: specific artistic styles, consistent character faces, product photography styles.

In A1111, add a LoRA by including its trigger keyword in your prompt with a weight: ``. In ComfyUI, use the Load LoRA node and connect it between the model loader and the sampler.

💡

Tip:Always download models in `.safetensors` format rather than `.ckpt`. The safetensors format cannot execute arbitrary code during loading, making it the safer choice. Most CivitAI and Hugging Face models now offer safetensors downloads.

Writing Effective Prompts

Stable Diffusion prompts follow a different structure than conversational AI tools. The model responds to weighted keyword phrases, not natural sentences.

Positive Prompt Structure

A good positive prompt layers descriptors from general to specific:

[subject], [medium/style], [lighting], [composition], [quality keywords]

Example for a portrait:

a young woman with red hair, professional headshot photography, soft studio lighting, shallow depth of field, 85mm lens, sharp focus, 8k resolution

Negative Prompt — What to Exclude

Negative prompts tell the model what to avoid. A standard negative prompt for photorealistic images:

blurry, out of focus, low quality, jpeg artifacts, watermark, signature, text, ugly, deformed, bad anatomy, extra limbs, missing fingers, cropped

Prompt Weighting

Both ComfyUI and A1111 support weight adjustments:

Syntax	Effect
`(keyword)`	Increases weight by 1.1x
`(keyword:1.5)`	Sets weight to 1.5x
`[keyword]`	Decreases weight by 0.9x
`(keyword:0.5)`	Sets weight to 0.5x (deprioritise)

Example: `(photorealistic:1.3), (sharp focus:1.2), woman, garden, [cartoon:0.3]`

💡

Tip:Do not over-weight keywords (above 1.5). High weights cause visual artifacts. Start with default weights and only boost the 2-3 most critical elements. Test with different seeds to find the right balance before adjusting weights.

Troubleshooting

CUDA out of memory error when generating

Cause: The model or image resolution requires more VRAM than available

Fix: Reduce image resolution to 768x768 (SDXL) or 512x512 (SD 1.5). Start ComfyUI with `--lowvram` or `--medvram` flags. In A1111, enable Settings > Optimizations > medvram. Close other GPU-intensive applications before generating.

Black image output with no error message

Cause: Wrong VAE file, or the base model requires a specific VAE that is not loaded

Fix: Download the correct VAE for your model. For SDXL: `sdxl_vae.safetensors` from Stability AI on Hugging Face. In A1111: Settings > Stable Diffusion > SD VAE. In ComfyUI, add a VAE Loader node and connect it to the KSampler.

Very slow generation (5+ minutes per image) on GPU

Cause: PyTorch is using CPU instead of GPU due to a driver or installation issue

Fix: Verify PyTorch sees the GPU: `python -c "import torch; print(torch.cuda.is_available())"`. Should print `True`. If `False`, reinstall PyTorch with the correct CUDA version for your GPU. Check `nvidia-smi` to confirm the driver is working.

ComfyUI shows "model not found" when loading a workflow

Cause: The workflow references a model filename that does not match what you have in models/checkpoints/

Fix: Open the checkpoint node in ComfyUI and click the model selector dropdown. It shows all models in the checkpoints folder. Select the model you have downloaded. The workflow filename must exactly match the file on disk.

A1111 webui.sh fails with Python version errors

Cause: Python 3.12 has compatibility issues with some A1111 dependencies

Fix: Use Python 3.10 or 3.11. Install pyenv to manage Python versions: `pyenv install 3.10.14 && pyenv local 3.10.14`. Then re-run webui.sh.

Alternatives to Consider

Tool	Type	Price	Best For
Midjourney	Cloud SaaS	$10/mo (200 images) to $60/mo (unlimited)	High quality without setup — best results for artistic and commercial images with minimal prompting
Fooocus	Self-hosted (Python)	Free (open source)	Simplest local Stable Diffusion setup — minimal UI, auto-optimises settings, no manual configuration
InvokeAI	Self-hosted (Docker/Python)	Free (open source)	Professional local workflow with a polished UI, built-in canvas, and team features
Replicate	Cloud API	$0.0046 per image (SDXL)	Developers who want Stable Diffusion via API without managing local infrastructure

Frequently Asked Questions

What GPU do I need to run Stable Diffusion locally?

The minimum practical GPU for Stable Diffusion is one with 6 GB VRAM. An NVIDIA RTX 3060 (12 GB VRAM) or RTX 4060 (8 GB VRAM) handles SDXL at 1024px resolution in 3-8 seconds per image and costs under $300.

For 4 GB VRAM GPUs (GTX 1650, RTX 3050), SD 1.5 models work with the `--medvram` flag. SDXL is possible but slow. AMD GPUs with 8+ GB VRAM work on Linux with ROCm. Apple Silicon Macs use the Metal backend and achieve good speeds even without a dedicated GPU.

What is the difference between ComfyUI and AUTOMATIC1111?

AUTOMATIC1111 uses a traditional web form interface with sliders, dropdowns, and buttons. It is easier to learn for new users and has a massive extension library built up since 2022.

ComfyUI uses a node-based canvas where you connect nodes representing each step of the generation pipeline. It is harder to learn initially but much more flexible. Advanced techniques like multi-pass generation, ControlNet workflows, and custom pipelines are easier to build and share in ComfyUI. Most professional Stable Diffusion users have moved to ComfyUI.

Can I run Stable Diffusion without a GPU?

Yes, but it is extremely slow. CPU-only generation on Stable Diffusion XL takes 5-15 minutes per image on a modern desktop CPU. SD 1.5 takes 2-5 minutes per image.

For CPU users, use Fooocus which is optimised for low-resource setups. Apple Silicon Macs are the exception — they use the Metal Performance Shaders (MPS) backend which gives GPU-like speeds through the unified memory architecture.

Where do I download Stable Diffusion models?

The two main sources are Hugging Face and CivitAI. Hugging Face hosts official base models (SDXL Base 1.0, SD 1.5, Flux.1). CivitAI has thousands of community fine-tuned models for specific styles, characters, and use cases.

Always download `.safetensors` format rather than `.ckpt`. Safetensors cannot execute code during loading, making it safer. Check model ratings and comments on CivitAI before downloading — community feedback indicates whether a model actually works as advertised.

What is a LoRA model in Stable Diffusion?

LoRA (Low-Rank Adaptation) models are small weight files (50-300 MB) that modify a base model to add a specific style, subject, or concept without replacing the entire checkpoint. You load a LoRA alongside your base model and control its influence with a weight value between 0 and 1.

Common uses: consistent character faces across multiple images, specific artistic styles (watercolor, oil painting, anime), product photography styles, and architectural rendering styles. LoRAs are additive — you can combine multiple LoRAs in one generation, though stacking too many (3+) often causes visual artifacts.

How do I use Stable Diffusion for inpainting (editing parts of an image)?

Both ComfyUI and A1111 support inpainting. In A1111, switch to the img2img tab, upload your image, click "Inpaint", paint over the area you want to change, describe the replacement in the prompt, and click Generate.

In ComfyUI, use the InpaintModelConditioning node with a mask. Draw the mask in the Load Image node's mask editor or use the WAS Suite's mask tools. The model fills only the masked region while preserving the rest of the image.

Use a denoising strength of 0.7-0.85 for inpainting — lower values preserve more of the original, higher values allow more creative deviation.

Related Guides

Beginner20 min

How to Run Ollama Locally: Complete Setup Guide (2026)

Beginner10 min

Best Local LLM Models to Run in 2026 (Benchmarks + Use Cases)

Intermediate25 min

How to Use Ollama with Python: API Integration Tutorial (2026)