How to Set Up AnythingLLM with Ollama (2026 Guide)
Install AnythingLLM Desktop or Docker, connect Ollama, and set up a local RAG workspace. Covers Ollama URL fix, nomic-embed-text embedding, and agents. 2026.
AnythingLLM is an open-source app that wraps RAG, workspaces, agents, and multi-user access around your local model setup. It passed 54,000 GitHub stars in early 2026. The difference from Open-WebUI is simple: Open-WebUI is a chat frontend, AnythingLLM is built around document Q&A. You upload files and query them privately, with nothing leaving your machine.
The setup needs two things from Ollama: a chat model and an embedding model. The embedding model converts your documents into searchable vectors at upload time. Skip it and document uploads fail quietly, which is confusing.
The other common failure is the Ollama URL. Inside a Docker container, `localhost` refers to the container, not the host machine where Ollama runs. One wrong URL in the connection settings and the whole thing looks broken. This guide covers that fix specifically, along with both installation methods, workspace setup, and agents.
Prerequisites
- Ollama installed and running (follow the Ollama setup guide if you have not set it up yet)
- A chat model pulled in Ollama (recommended: llama3.3:8b or mistral:7b)
- The nomic-embed-text embedding model pulled in Ollama (required for document Q&A)
- Docker Engine 24.x+ installed (for the Docker method only)
- Port 3001 free on your machine
- 4 GB free disk space for the AnythingLLM Docker image
- 8 GB RAM minimum when running a 7B chat model alongside AnythingLLM
Need a VPS?
Run this on a Contabo Cloud VPS 10 starting at €5.45/mo. Reliable Linux VPS with NVMe storage, ideal for self-hosted AI workloads.
In This Guide
Install AnythingLLM
AnythingLLM offers two installation paths. The Desktop App is the fastest option for personal use on a single machine. Docker is the right choice for server deployments, always-on access, or when you want multiple users sharing the same instance.
Desktop App (Recommended for Single Users)
The Desktop App bundles everything into a standalone installer. No Docker or Node.js required.
Download the installer for your operating system from the official docs:
| OS | File | Download |
|---|---|---|
| Windows | AnythingLLMDesktop.exe | docs.useanything.com/installation/desktop/windows |
| macOS (Apple Silicon) | AnythingLLM.dmg | docs.useanything.com/installation/desktop/mac |
| Linux | AnythingLLM.AppImage | docs.useanything.com/installation/desktop/linux |
After installation, launch the app. It opens a browser window at `http://localhost:3001` automatically. The Desktop App includes its own embedded vector database and storage, so no external configuration is required for the app itself.
Docker (Recommended for Servers and Multi-User Access)
Create a directory for persistent storage, then run the official image:
# Create a storage directory on the host
mkdir -p ~/anythingllm/storage
# Run AnythingLLM
docker run -d \
-p 3001:3001 \
--cap-add SYS_ADMIN \
-v ~/anythingllm/storage:/app/server/storage \
-e STORAGE_DIR=/app/server/storage \
--name anythingllm \
--restart unless-stopped \
mintplexlabs/anythingllmThe image is approximately 2 GB. Wait for the download and startup to complete, then verify:
docker ps
# Expected output:
# CONTAINER ID IMAGE STATUS PORTS
# a1b2c3d4e5f6 mintplexlabs/anythingllm Up 2 minutes 0.0.0.0:3001->3001/tcpOpen `http://localhost:3001` in your browser.
Docker Compose (Optional)
If you prefer to manage everything in one file, create a `docker-compose.yml`:
version: '3.8'
volumes:
anythingllm_storage:
services:
anythingllm:
image: mintplexlabs/anythingllm
restart: unless-stopped
ports:
- "3001:3001"
cap_add:
- SYS_ADMIN
volumes:
- anythingllm_storage:/app/server/storage
environment:
- STORAGE_DIR=/app/server/storageStart it with:
docker compose up -dPull Required Ollama Models
AnythingLLM needs two types of models from Ollama: a chat model that generates answers, and an embedding model that converts your documents into vectors for search.
Chat Models
Pull one of these depending on your hardware. For a deeper comparison of model quality and hardware requirements, see the best local LLM models guide.
| Model | Pull Command | Disk Size | RAM Required | Best For |
|---|---|---|---|---|
| llama3.3:8b | `ollama pull llama3.3:8b` | 4.9 GB | 8 GB | General use, balanced quality |
| mistral:7b | `ollama pull mistral:7b` | 4.1 GB | 8 GB | Faster inference, lower RAM |
| qwen2.5:7b | `ollama pull qwen2.5:7b` | 4.7 GB | 8 GB | Coding and multilingual docs |
| phi4 | `ollama pull phi4` | 9.1 GB | 16 GB | Strong reasoning on technical docs |
Embedding Model (Required for Document Q&A)
The embedding model converts your uploaded documents into vectors that AnythingLLM searches at query time. Without it, document uploads fail silently.
# Pull the embedding model
ollama pull nomic-embed-text
# Expected output:
# pulling manifest
# pulling 970aa74c0a90... 100% 274 MB
# verifying sha256 digest
# success`nomic-embed-text` is 274 MB and runs on CPU without GPU memory. It is the standard embedding model for Ollama-based setups and works well for English and multilingual documents.
Verify Both Models Are Available
ollama list
# Expected output (your versions may differ):
# NAME ID SIZE MODIFIED
# llama3.3:8b a6eb4748fd29 4.9 GB 2 minutes ago
# nomic-embed-text:latest 0a109f422b47 274 MB 1 minute agoBoth models must appear in this list before you configure AnythingLLM.
Connect AnythingLLM to Ollama
The Ollama connection URL is the most common source of setup failures. The correct URL depends on how AnythingLLM is running.
Which URL to Use
| AnythingLLM Installation | Ollama Location | URL to Enter |
|---|---|---|
| Desktop App | Same machine | `http://localhost:11434` |
| Docker (Windows or macOS) | Same host machine | `http://host.docker.internal:11434` |
| Docker (Linux) | Same host machine | `http://172.17.0.1:11434` |
| Docker | Another server | `http:// |
Setup Wizard Steps
The first time you open `http://localhost:3001`, a setup wizard walks you through the configuration:
1. Create an admin account (username and password) 2. On the LLM configuration screen, select "Ollama" from the provider list 3. Enter the Ollama base URL for your deployment (see table above) 4. Select your chat model from the dropdown (it fetches the list from Ollama automatically) 5. On the embedding configuration screen, select "Ollama" again 6. Select `nomic-embed-text:latest` as the embedding model 7. Complete the wizard
Verify the Connection After Setup
In the AnythingLLM interface, go to Settings (gear icon) > LLM Provider. The status indicator next to the Ollama URL should show green. If it shows red, the URL is wrong or Ollama is not running.
Test Ollama directly to confirm it is reachable:
# If using Desktop App or checking from the host machine
curl http://localhost:11434
# Expected: Ollama is running
# If using Docker on Linux, test the bridge IP
curl http://172.17.0.1:11434
# Expected: Ollama is runningIf Ollama is not running, start it:
# Linux (systemd service)
sudo systemctl start ollama
# macOS or manual start
ollama serveCreate a Workspace and Upload Documents
Workspaces are the core concept in AnythingLLM. Each workspace is a separate document container with its own vector store, chat history, and model settings. Documents uploaded to one workspace are not visible to other workspaces. If you came from Open-WebUI, think of workspaces as separate chat sessions that each have their own private document library.
Create Your First Workspace
1. Click the "+" button or "New Workspace" in the left sidebar 2. Enter a name for the workspace (for example, "Research Notes" or "Company Docs") 3. Click Create
Upload Documents
Click the paperclip icon in the chat input area, or use the Document Manager (the folder icon in the sidebar). Supported file types include:
- PDF, DOCX, TXT, Markdown, HTML
- CSV, XLSX, PPTX
- JSON files and most code file types (.py, .js, .ts, .go, etc.)
- GitHub repository URLs (imports the full repo content)
- YouTube video URLs (imports the transcript)
After selecting files, AnythingLLM processes each document and shows a progress indicator. Processing time depends on file size. A 100-page PDF typically takes 10-30 seconds on a modern CPU.
Processing: research-paper.pdf
Chunking document...
Embedding 47 chunks with nomic-embed-text...
Done. 47 vectors stored.Chat Mode vs Query Mode
Each workspace has two response modes, controlled by the toggle in the workspace settings:
| Mode | Behavior | Best For |
|---|---|---|
| Chat | Uses uploaded documents as context alongside the model's general knowledge | Mixed document + general Q&A |
| Query | Returns answers only from uploaded documents; refuses off-topic questions | Strict document retrieval |
Switch modes via the settings icon inside the workspace. For research and document Q&A, Query mode gives more focused answers and prevents the model from hallucinating information not in your files.
Test the Setup
After uploading a document, ask a question about its content in the chat input. A working RAG response includes a "Sources" section below the answer showing which document chunks were retrieved.
Enable Agents and MCP Tools
AnythingLLM includes an agent system that lets the model take actions beyond answering questions. Agents can browse the web, run code, call external APIs, and use MCP (Model Context Protocol) servers.
Activate an Agent
Type `@agent` at the start of any chat message to activate agent mode in that conversation:
@agent Search the web for the latest AI papers from May 2026 and summarize the top 3.The agent works through the task step by step, showing each action in the chat window. Default built-in agent capabilities include:
- Web browsing and URL fetching
- Basic calculation and data analysis
- Reading documents already in the workspace
Enable or Disable Agent Capabilities
Go to Settings > Agents to see which capabilities are active. You can toggle individual capabilities on or off. For example, disable web browsing if you want agents restricted to local documents only.
Add MCP Servers
AnythingLLM has built-in MCP support as of 2026. To add an MCP server:
1. Go to Settings > Agent Tools > MCP Servers 2. Click "Add MCP Server" 3. Paste the server configuration in JSON format
Example: adding a local filesystem MCP server:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/yourname/Documents"
]
}
}
}4. Click Save and restart AnythingLLM if prompted
After adding a server, the agent can use its tools automatically when relevant. You can also call a specific tool by naming it in your `@agent` message. For programmatic access to Ollama models outside the UI, see the Ollama Python guide.
Update AnythingLLM
For Docker deployments, pull the latest image and restart:
docker pull mintplexlabs/anythingllm
docker stop anythingllm
docker rm anythingllm
# Re-run with the same flags as the original install
docker run -d \
-p 3001:3001 \
--cap-add SYS_ADMIN \
-v ~/anythingllm/storage:/app/server/storage \
-e STORAGE_DIR=/app/server/storage \
--name anythingllm \
--restart unless-stopped \
mintplexlabs/anythingllmYour documents, chat history, and settings are preserved in the `~/anythingllm/storage` directory. The Docker container itself is stateless.
For the Desktop App, check for updates from the Help menu inside the app.
Troubleshooting
"Ollama is not available" or empty model dropdown
Cause: AnythingLLM cannot reach the Ollama API. In Docker deployments, localhost in the URL resolves to the container itself, not the host machine.
Fix: Change the Ollama URL in Settings > LLM Provider. Use http://host.docker.internal:11434 on Windows or macOS. Use http://172.17.0.1:11434 on Linux. Use http://localhost:11434 only for the Desktop App.
Document upload completes but queries return no sources
Cause: The embedding model is not configured or nomic-embed-text is not pulled in Ollama.
Fix: Run `ollama pull nomic-embed-text` on the host running Ollama. Then go to Settings > Embedding Provider in AnythingLLM, select Ollama, and choose nomic-embed-text:latest. Re-upload the documents to re-embed them.
Document embedding hangs or crashes partway through
Cause: Insufficient RAM when embedding a large file, or a corrupt PDF that Chromium cannot parse.
Fix: Split large PDFs into smaller files (under 50 pages each) before uploading. For corrupt files, convert to plain text first using a tool like pdftotext. Check Docker container logs: `docker logs anythingllm` for specific error messages.
@agent command is not recognized or does nothing
Cause: Agent mode is disabled or no agent provider is configured.
Fix: Go to Settings > Agents and confirm agent mode is enabled. The agent uses the same LLM configured under Settings > LLM Provider. Make sure a chat model is selected and the Ollama connection is working.
AnythingLLM container exits immediately on start
Cause: The --cap-add SYS_ADMIN flag is missing, or the storage volume path does not exist.
Fix: Confirm the docker run command includes `--cap-add SYS_ADMIN`. Create the storage directory before running: `mkdir -p ~/anythingllm/storage`. Check logs with `docker logs anythingllm` to see the specific exit reason.
Alternatives to Consider
| Tool | Type | Price | Best For |
|---|---|---|---|
| Open-WebUI | Self-hosted | Free | A clean ChatGPT-style interface for Ollama with conversation history and model switching. No RAG or agents, but easier to set up. |
| PrivateGPT | Self-hosted | Free | API-first RAG server for developers who want to integrate document Q&A into their own applications rather than use a web UI. |
| LibreChat | Self-hosted | Free | Multi-model chat with support for OpenAI, Anthropic, Ollama, and 10+ other providers in one interface. Better for multi-model comparison than document RAG. |
| Flowise | Self-hosted | Free | Visual drag-and-drop builder for RAG pipelines and AI agents. More flexible than AnythingLLM for custom flows but requires more configuration. |
Frequently Asked Questions
Is AnythingLLM free to use?
The self-hosted version is completely free and open-source under the MIT license. The Desktop App is also free to download and use. Mintplex Labs offers a paid cloud version (AnythingLLM Cloud) for teams that do not want to manage their own server, but self-hosting has no cost beyond your hardware or VPS.
What is the difference between AnythingLLM and Open-WebUI?
Open-WebUI is a chat frontend for Ollama. It provides a clean ChatGPT-style interface with model switching and conversation history, but it does not have built-in RAG, workspaces, or multi-user management beyond basic accounts.
AnythingLLM adds document upload and Q&A (RAG), isolated workspaces per project or team, AI agents that can browse the web and call tools, and role-based user management. If you only need to chat with local models, Open-WebUI is simpler to set up. If you need to query your own documents privately, AnythingLLM is the better fit.
Does AnythingLLM work without an internet connection?
Yes, the core functionality works fully offline. Chat, document upload, embedding, and Q&A all run locally using Ollama models. The only features that require internet are agent web browsing and any external MCP server that fetches data from the web.
To use AnythingLLM in an air-gapped environment, pull all required Ollama models while you have internet access, then disconnect. The models stay on disk and do not need to be re-downloaded.
What file types does AnythingLLM support?
AnythingLLM supports PDF, DOCX, TXT, Markdown (.md), HTML, CSV, XLSX, PPTX, and JSON files. It also supports over 50 code file types including .py, .js, .ts, .go, .java, and .cpp.
Beyond local files, you can import content from GitHub repository URLs (it clones and indexes the full repo) and YouTube video URLs (it fetches the transcript). The document processor uses Chromium for HTML and PDF rendering, which is why the Docker image requires the SYS_ADMIN capability.
What is a workspace in AnythingLLM?
A workspace is an isolated document container with its own vector store and chat history. Documents uploaded to one workspace are invisible to other workspaces. This lets you maintain separate contexts for different projects or clients without them interfering with each other.
Each workspace has its own settings including response mode (Chat or Query), context window configuration, and which model to use if you want different workspaces to run different models.
Which embedding model should I use with Ollama?
nomic-embed-text is the standard recommendation for Ollama-based AnythingLLM setups. It is 274 MB, runs on CPU without requiring GPU memory, handles documents in most languages, and produces strong retrieval quality for general content.
If you have a larger budget of RAM and want better multilingual support, mxbai-embed-large (670 MB) or snowflake-arctic-embed:l (1.2 GB) are alternatives available in the Ollama library. Pull them with `ollama pull mxbai-embed-large` and select them in AnythingLLM's embedding settings.
Can multiple users share one AnythingLLM instance?
Yes, multi-user mode is available in the Docker deployment. Go to Settings > Multi-User Mode to enable it. You can create accounts with three roles: Admin (full access), Manager (can create workspaces and invite users), and Default (can only access workspaces assigned to them).
The Desktop App is designed for single-user use only. For teams, deploy AnythingLLM via Docker on a server and expose it through an Nginx reverse proxy with SSL.
How do I activate and use agents in AnythingLLM?
Type `@agent` at the beginning of a chat message in any workspace. The model enters agent mode and works through the task step by step. Default capabilities include web browsing, calculation, and workspace document access.
To configure or disable specific capabilities, go to Settings > Agents. To add external tools, go to Settings > Agent Tools > MCP Servers and add a server configuration in JSON format. Agents require the chat model to be capable of tool use — Llama 3.3 8B, Qwen2.5 7B, and Mistral 7B all support it.
How much RAM does AnythingLLM need?
AnythingLLM itself uses around 512 MB of RAM for the Node.js process and vector database. The real memory requirement comes from the Ollama models running alongside it.
A typical setup with a 7B chat model (llama3.3:8b or mistral:7b) requires 7-8 GB of RAM just for the model, plus 512 MB for AnythingLLM, plus OS overhead. A machine with 16 GB RAM handles this comfortably. For 13B models, plan for at least 16 GB RAM total.
How do I update AnythingLLM to the latest version?
For Docker deployments, pull the latest image and recreate the container. Your data is safe in the named volume or host directory mount, so the update is non-destructive:
docker pull mintplexlabs/anythingllm
docker stop anythingllm && docker rm anythingllm
# Re-run the original docker run commandFor the Desktop App, open the app and check Help > Check for Updates. The updater downloads and installs the new version automatically while preserving your data.
Related Guides
How to Run Ollama Locally: Complete Setup Guide (2026)
How to Set Up Open-WebUI with Ollama (Docker Guide)
Best Local LLM Models to Run in 2026 (Benchmarks + Use Cases)
How to Run OpenClaw with Ollama Local Models (2026 Guide)
How to Use Ollama with Python: API Integration Tutorial (2026)