How to Install Hermes Agent with Ollama Local Models (2026)
Connect Hermes Agent by Nous Research to Ollama local models. Step-by-step install, 64K context fix, SOUL.md setup, and model selection for private AI automation.
Hermes Agent is an open-source autonomous AI agent from Nous Research, released under the MIT license (current version: v0.14.0). Unlike chatbots that respond to one message at a time, Hermes runs continuously, executes multi-step tasks across your filesystem and the web, connects to messaging platforms like Telegram and Discord, and learns from every session through a persistent memory system stored in `~/.hermes/`.
By default Hermes connects to cloud AI providers such as Anthropic and OpenAI. This guide replaces those cloud endpoints with Ollama, giving you an autonomous agent that runs entirely on your own hardware with zero per-message costs, no rate limits, and no data leaving your machine.
One critical detail that catches most people in the Hermes + Ollama setup: Ollama defaults to a 4,096-token context window, which is far too small for Hermes to function. The agent requires at least 64,000 tokens to maintain its working memory across multi-step tool-calling sequences. This guide addresses that fix first, before any other configuration. By the end you will have Hermes Agent running against a local Qwen3 or Llama 4 model, with persistent memory in SOUL.md, terminal and web tool access, and optionally a Telegram or Discord gateway for remote agent control.
Prerequisites
- Linux (Ubuntu 22.04+), macOS 12+, or Windows 10/11 with WSL2 enabled
- Minimum 16 GB RAM for 7B-8B models; 24 GB RAM for 13B-14B models
- Python 3.11 or higher (the Hermes installer manages this via uv if not already present)
- Git installed (the only manually required dependency)
- 10-20 GB free disk space for Ollama model files
- Ollama installed and running (the next section covers this if you have not set it up yet)
- (Optional) A VPS for 24/7 agent operation without your local machine staying on
Need a VPS?
Run this on a Contabo Cloud VPS 30 starting at âŦ16.95/mo. Reliable Linux VPS with NVMe storage, ideal for self-hosted AI workloads.
In This Guide
- 1What Hermes Agent Does and Why Local Models Make Sense
- 2Install Ollama and Configure the 64K Context Window
- 3Install Hermes Agent
- 4Configure Hermes Agent to Use Ollama
- 5First Run and Verification
- 6Configure SOUL.md: The Agent Identity File
- 7Connect Hermes to Telegram or Discord (Optional)
- 8Configuration Reference
- 9Troubleshooting
- 10FAQ
What Hermes Agent Does and Why Local Models Make Sense
Hermes Agent (GitHub: NousResearch/hermes-agent, MIT license) is an autonomous AI agent that runs as a persistent process on your machine. Instead of answering one question at a time, it executes multi-step tasks: browsing the web, running terminal commands, reading and writing files, sending messages through connected platforms, and scheduling recurring automations in natural language.
The core architectural components:
- SOUL.md: A Markdown file at `~/.hermes/SOUL.md` that stores your identity and preferences. Loaded at every session startup as the primary agent context.
- memories/ directory: Fact-based memories at `~/.hermes/memories/` extracted automatically from sessions and retrieved by relevance.
- skills/ directory: Step-by-step procedures the agent writes for itself and reuses across sessions.
- Gateway: Connections to Telegram, Discord, Slack, WhatsApp, Signal, and Email for remote task submission.
Why replace the cloud API with Ollama:
| Factor | Cloud API (Claude/GPT-4o) | Ollama Local Models |
|---|---|---|
| Cost per message | $0.003-$0.015 | $0 |
| Data privacy | Processed by third-party servers | Stays on your hardware |
| Rate limits | Yes (varies by plan) | None |
| Context window | Up to 200K (Claude) | 64K+ (configurable) |
| Reasoning quality | High | Medium-High (Qwen3, Llama 4) |
| Setup complexity | API key only | This guide |
For tasks involving sensitive data or sustained personal automation, local Ollama inference costs nothing beyond electricity and keeps every token on your own hardware. A well-configured 27B model like Qwen3.5 handles most agent workloads that previously required GPT-4o.
Compared to OpenClaw: OpenClaw has a web dashboard UI and a larger third-party skill marketplace (molthub). Hermes supports Docker, SSH, and cloud sandbox terminal backends, which makes it more predictable for technical developer workflows where the execution environment matters.
Install Ollama and Configure the 64K Context Window
Most Hermes + Ollama setups fail at the context window. Ollama defaults to 4,096 tokens. Hermes requires 64,000. The gap makes the agent behave erratically after the first few tool calls, and the symptom looks like a model quality problem rather than a configuration one. Fix this before anything else.
Install Ollama
# Linux and macOS â one-command installer
curl -fsSL https://ollama.com/install.sh | shVerify the installation:
ollama --version
# Expected: ollama version 0.6.xOn Linux, Ollama installs as a systemd service and starts automatically. Confirm it is running:
systemctl status ollama
# Expected: active (running)Choose a Model for Hermes Agent
Hermes Agent's tool-calling and multi-step reasoning require a model with reliable function-calling support and a minimum 64K context window. These models work well:
| Model | RAM Required | Pull Command | Tool Calling | Best For |
|---|---|---|---|---|
| Qwen3 8B | 8 GB | `ollama pull qwen3:8b` | Reliable | General tasks, 8-12 GB machines |
| Qwen3.5 27B | 16 GB | `ollama pull qwen3.5:27b` | Strong | Best free local model for Hermes |
| Mistral Small | 10 GB | `ollama pull mistral-small` | Solid | 128K context, fast responses |
| Llama 4 Maverick | 24 GB | `ollama pull llama4:maverick` | Best-in-class | Complex multi-step agent tasks |
Pull your chosen model. Qwen3 8B is the recommended starting point for most machines:
ollama pull qwen3:8b
# Expected output:
# pulling manifest
# pulling e... 100% ââââââââ 5.2 GB / 5.2 GB
# successAlso check the best local LLM models guide for a full hardware-to-model matching table.
Set the Context Window to 64K (Required)
Hermes Agent requires at least 64,000 tokens of context. Ollama defaults to 4,096. You must change this before running Hermes, or it will behave erratically after the first 3-4 tool calls.
Choose one of these three approaches:
**Option 1: Environment variable (simplest, current terminal session)**
OLLAMA_CONTEXT_LENGTH=64000 ollama serve**Option 2: Systemd service override (persists across reboots on Linux)**
# Open the systemd override editor
sudo systemctl edit ollama.serviceIn the editor that opens, add:
[Service]
Environment="OLLAMA_CONTEXT_LENGTH=64000"Save and reload:
sudo systemctl daemon-reload && sudo systemctl restart ollama**Option 3: Per-model Modelfile (most explicit, controls a single named model)**
# Create a Modelfile setting 64K context for qwen3:8b
cat > Modelfile << 'EOF'
FROM qwen3:8b
PARAMETER num_ctx 64000
EOF
# Build the modified model under a new name
ollama create qwen3-8b-64k -f Modelfile
# Expected output:
# transferring model data
# using existing layer sha256...
# creating new layer sha256...
# writing manifest
# successVerify the context length is active:
ollama run qwen3:8b --num_ctx 64000 "reply with just: OK"
ollama ps
# Check the CONTEXT column â should show 64000Install Hermes Agent
Hermes Agent installs via a one-command script that handles all dependencies: Python 3.11 (via uv), Node.js v22, ripgrep, ffmpeg, and Playwright for browser automation. Git is the only prerequisite you need to install manually.
Linux, macOS, and Windows WSL2
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bashThe script takes 2-5 minutes on a 100 Mbps connection. After it finishes, reload your shell:
# Linux (bash)
source ~/.bashrc
# macOS (zsh)
source ~/.zshrcVerify the installation:
hermes --version
# Expected: hermes-agent X.X.XAlternative: pip install
pip install hermes-agent
# Install optional system dependencies (Node.js, ripgrep, ffmpeg, Playwright)
hermes postinstallWindows Native PowerShell (Early Beta)
iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)The Windows installer downloads a portable Git distribution and installs Hermes to `%LOCALAPPDATA%\hermes\hermes-agent`. No admin rights required. This path is in early beta as of v0.14.0. WSL2 provides a more stable experience on Windows.
Run the Diagnostic Check
After installation, confirm all components are present:
hermes doctor
# Expected output:
# â Python 3.11.x
# â Node.js v22.x.x
# â ripgrep x.x.x
# â ffmpeg x.x.x
# â playwright chromium
# â hermes-agent x.x.xAny failed checks appear with an X and a suggested fix command. Run `hermes postinstall` if any optional dependencies are missing.
Configure Hermes Agent to Use Ollama
With Ollama running at 64K context and Hermes installed, point Hermes at the local endpoint.
Interactive Setup (Recommended)
hermes modelThe interactive menu asks for your provider. Select:
1. "Custom endpoint (self-hosted / VLLM / etc.)" 2. Enter URL: `http://localhost:11434/v1` 3. API key: press Enter to skip (Ollama requires no key) 4. Model name: type the exact name shown in `ollama list`, for example `qwen3:8b` or `qwen3-8b-64k` if you used the Modelfile method
Hermes writes these values to `~/.hermes/config.yaml` automatically.
Manual Configuration via config.yaml
Open the config file directly:
hermes config editSet these values under the `model` key:
model:
default: qwen3:8b
provider: custom
base_url: http://localhost:11434/v1
context_length: 64000Windows WSL2: Fix the Localhost Issue
If Ollama runs on your Windows host and Hermes runs inside WSL2, `localhost:11434` inside WSL2 refers to the WSL2 container, not your Windows machine. Two steps to fix this:
**Step 1: Tell Ollama on Windows to listen on all interfaces**
# Run in Windows PowerShell before starting Ollama
$env:OLLAMA_HOST = "0.0.0.0"
ollama serve**Step 2: Find your Windows host IP from inside WSL2**
# Inside your WSL2 terminal
cat /etc/resolv.conf | grep nameserver
# The IP shown (e.g., 172.29.192.1) is your Windows host IP**Step 3: Use that IP as the base URL in config.yaml**
model:
base_url: http://172.29.192.1:11434/v1This is the same networking issue that affects OpenClaw and AnythingLLM Docker setups. Any tool running in an isolated environment needs the actual host IP rather than `localhost`.
First Run and Verification
Start Hermes Agent:
# Standard CLI interface
hermes
# Modern TUI interface (recommended â shows tool activity in real time)
hermes --tuiThe startup banner confirms the model and context configuration:
_ _
| | | | ___ _ __ _ __ ___ ___ ___
| |__| |/ _ \ '__| '_ ` _ \ / _ \/ __|
| __ | __/ | | | | | | | __/\__ \
|_| |_|\___|_| |_| |_| |_|\___||___/
Model: qwen3:8b (custom endpoint)
Context: 64000 tokens
Tools: terminal, browser, file, web_searchTest with a task that uses the terminal tool:
List the 5 largest files in my home directory by sizeHermes calls the terminal tool to run `find ~ -maxdepth 1 -type f -printf '%s %f\n' | sort -rn | head -5`, shows the command for approval (in manual mode), and returns the results. If you see a tool call followed by real output, the Ollama connection is working.
Test a multi-step task to verify the 64K context holds:
Search the web for the latest Qwen3 benchmark results, summarise the findings in 3 bullet points, and save them to a file called qwen3-notes.txt in my home directoryThis task requires web browsing, reasoning, and file writing across multiple turns. If Hermes completes it without a context overflow error, the 64,000-token configuration is correct.
Manage Long Sessions with /compress
Hermes auto-compresses the context when it reaches 50% of the 64K limit (around 32,000 tokens). Trigger it manually in long sessions:
/compressThis summarises the session history into a compact representation while keeping the last 20 messages intact. On a 64K context window with Qwen3 8B, a multi-hour coding or research session stays within limits with one or two manual compress calls.
Configure SOUL.md: The Agent Identity File
SOUL.md is stored at `~/.hermes/SOUL.md`. Hermes reads it at every startup as its primary identity context, the first slot in the system prompt. It tells the agent who you are, your working preferences, timezone, active projects, and any standing instructions.
Open it for editing:
# Open via Hermes command
hermes config edit soul
# Or edit directly
nano ~/.hermes/SOUL.mdAdd your context in plain Markdown:
# Identity
My name is [Your name]. I work as a [role] in [timezone].
## Working Preferences
- Prefer concise output unless asked for detail
- When writing code, use Python 3.11+ syntax and type hints
- Default file format: Markdown
- Primary OS: Ubuntu 22.04
## Active Projects
- [List current projects so Hermes maintains context across sessions]
## Recurring Tasks
- Daily morning brief: weekdays at 9 AM
- Weekly report: Fridays before 4 PM
## Directories and Paths
- Projects: ~/projects/
- Notes: ~/notes/
- Downloads: ~/Downloads/Hermes also writes to SOUL.md and the separate `~/.hermes/memories/` folder automatically as it learns your patterns across sessions. The memory system maintains two separate stores:
| Store | Location | Type | Updated by |
|---|---|---|---|
| SOUL.md | `~/.hermes/SOUL.md` | Identity and preferences | Human-editable + auto-updated |
| Memories | `~/.hermes/memories/` | Facts extracted from sessions | Automatic only |
You do not need to update SOUL.md manually beyond the initial setup. But providing your name, timezone, and key preferences at the start saves several sessions of inference time compared to waiting for the agent to infer your context automatically.
Connect Hermes to Telegram or Discord (Optional)
The Hermes gateway connects the running agent to Telegram, Discord, Slack, WhatsApp, Signal, or Email. You send tasks from your phone without opening a terminal, and the agent can message you with updates or task completions while it works.
Run the gateway setup wizard:
hermes gateway setupThe wizard walks through your chosen platform. For Telegram, you need a bot token from @BotFather.
Set Up a Telegram Gateway
1. Open Telegram and search for @BotFather 2. Send `/newbot`, follow the prompts, and copy the token 3. Run `hermes gateway setup` and select Telegram 4. Paste your token when prompted
Test the gateway in the foreground:
hermes gateway runSend a message to your bot. Hermes responds using the same local Ollama model. To run the gateway persistently:
# Using tmux (recommended over systemd on WSL2)
tmux new-session -d -s hermes-gateway 'hermes gateway run'
# Check gateway status
tmux attach -t hermes-gatewayOn a Linux server with working systemd:
sudo systemctl enable --now hermes-gateway
sudo journalctl -fu hermes-gateway # view live logsCheck gateway status at any time:
hermes gateway statusConfiguration Reference
All Hermes settings live in `~/.hermes/config.yaml`. Secrets and API keys go in `~/.hermes/.env`. The most relevant settings for an Ollama-backed setup:
| Setting | Default | Purpose |
|---|---|---|
| `model.default` | (set via `hermes model`) | Model name, must match `ollama list` output exactly |
| `model.base_url` | (set via `hermes model`) | Ollama endpoint: `http://localhost:11434/v1` |
| `model.context_length` | auto-detected | Set explicitly to match your Ollama `num_ctx` value |
| `approvals.mode` | `manual` | `manual` (prompt each time), `smart` (auto-approve safe ops), `off` (fully autonomous) |
| `terminal.backend` | `local` | Execution environment: `local`, `docker`, `ssh`, or `modal` |
| `memory.memory_enabled` | `true` | Toggle persistent memory across sessions |
| `compression.enabled` | `true` | Auto-compress context at the threshold below |
| `compression.threshold` | `0.50` | Compress when context reaches this fraction of the limit |
| `delegation.max_concurrent_children` | `3` | Maximum parallel subagents |
| `display.show_reasoning` | `false` | Show model thinking traces in output |
| `HERMES_STREAM_READ_TIMEOUT` | `120` | Streaming timeout in seconds (increase for slow local models) |
Set values via CLI without opening config.yaml:
# Switch terminal commands to run inside Docker containers
hermes config set terminal.backend docker
# Show model reasoning traces (useful when debugging Qwen3 thinking mode)
hermes config set display.show_reasoning true
# Extend streaming timeout for slower models or CPU-only inference
hermes config set HERMES_STREAM_READ_TIMEOUT 1800
# Enable smart approval mode for less interruption
hermes config set approvals.mode smartInstall and Manage Skills
Hermes ships with 70+ built-in skills (capabilities for file management, web search, Python scripting, image generation, and more). Search and install additional community skills:
# Search available skills
hermes skills search web
# List installed skills
hermes skills list
# Install a skill
hermes skills install web-scraperSkills are stored in `~/.hermes/skills/` and persist across updates.
Update Hermes Agent
# pip install method
pip install --upgrade hermes-agent
# curl install method â re-run the installer
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bashYour `~/.hermes/` directory (config.yaml, SOUL.md, memories, and skills) is never modified by an update.
Troubleshooting
`hermes: command not found` after installation
Cause: Shell PATH not updated with the new `~/.local/bin` entry added by the installer
Fix: Run `source ~/.bashrc` (Linux) or `source ~/.zshrc` (macOS). If it still fails, add `export PATH="$HOME/.local/bin:$PATH"` to your shell profile manually, save, and reload.
Hermes connects but gives incoherent or incomplete responses after a few tool calls
Cause: Ollama context window still at 4,096 tokens. Hermes fills the context after the first 3-4 tool call outputs and starts losing earlier steps
Fix: Run `ollama ps` and check the CONTEXT column. If it shows 4096, apply the OLLAMA_CONTEXT_LENGTH=64000 environment variable or create a Modelfile variant as shown in the install section. Then set `model.context_length: 64000` in `~/.hermes/config.yaml` to match.
Tool calls hang indefinitely; Hermes appears frozen mid-task
Cause: Streaming read timeout is too short for local inference, or the model is doing CPU-only inference and generating very slowly
Fix: Add `HERMES_STREAM_READ_TIMEOUT=1800` to `~/.hermes/.env`. Check whether inference is using GPU or CPU with `ollama ps`. GPU inference shows VRAM usage. CPU inference on a 7B model can take 15-30 seconds per turn on an 8-core machine.
WSL2: "Connection refused" when Hermes tries to reach Ollama
Cause: `localhost` inside WSL2 refers to the WSL2 container, not the Windows host machine where Ollama is running
Fix: On Windows, set `$env:OLLAMA_HOST = "0.0.0.0"` before starting Ollama. Find your Windows IP from WSL2 with `cat /etc/resolv.conf | grep nameserver`. Set `model.base_url: http://
Model produces wrong tool-call structure or skips steps entirely
Cause: The selected model has weak function-calling reliability at the configured context length
Fix: Switch to Qwen3 8B or Qwen3.5 27B, which have the most reliable tool-calling among local models as of mid-2026. For coding tasks, `qwen2.5-coder:32b` performs better. Update `model.default` in config.yaml and run `hermes model` to confirm the new selection.
`hermes doctor` shows Playwright missing
Cause: Browser automation dependency was not installed during `postinstall`
Fix: Run `hermes postinstall` to retry. If that fails: `pip install playwright && playwright install chromium`. Playwright is required for web browsing tasks but optional if you only need terminal, file, and web search tools.
Alternatives to Consider
| Tool | Type | Price | Best For |
|---|---|---|---|
| OpenClaw | Self-hosted | Free (self-hosted) | Users who want an autonomous agent with a web dashboard UI and a large third-party skill marketplace (molthub). OpenClaw gained 113,000+ GitHub stars in January 2026 and has wider community skill coverage. The setup complexity and security considerations are similar to Hermes. |
| AnythingLLM | Desktop app / Self-hosted | Free | Chat-with-documents (RAG) workflows. AnythingLLM excels at private document Q&A with local models. Less capable as an autonomous agent than Hermes, but far simpler to configure for document-heavy use cases. |
| Open-WebUI | Self-hosted | Free | Interactive chat and basic RAG with local models. Open-WebUI is a ChatGPT-style frontend, not an autonomous agent. Better when you want to stay in control of every step rather than delegating multi-step tasks. |
| Contabo Cloud VPS 30 | Cloud (self-managed) | From âŦ16.95/month | Running Hermes Agent 24/7 on a cloud server without local hardware requirements. The VPS stays connected to Telegram and Discord even when your laptop is off, and 24 GB RAM handles Hermes alongside Qwen3.5 27B continuously. |
Frequently Asked Questions
Can I run Hermes Agent on a laptop?
Yes. A laptop with 16 GB RAM handles Qwen3 8B (needs 8 GB for the model) alongside Hermes Agent without issues. With 8 GB total RAM, the machine runs at its memory ceiling. Hermes itself uses around 300-500 MB for its Node.js and Python processes, leaving barely enough headroom for Ollama to load a 7B model.
The practical constraint on a laptop is sustained CPU load and battery drain during inference. CPU-only inference on a 7B model discharges a laptop battery in 2-3 hours under normal agent usage. For sustained or 24/7 agent work, a VPS or desktop machine is more practical than a laptop.
For laptop users who want Hermes always accessible, a Contabo Cloud VPS 30 at âŦ16.95/month runs Hermes as a background service and keeps the Telegram gateway connected without any local machine running.
Why does Hermes Agent need 64,000 tokens of context?
Hermes orchestrates multi-step tasks where each tool call and its output stays in the context window so the model can reason across the full sequence. A single agentic task might involve 10-30 tool calls: web searches, terminal commands, file reads, and intermediate reasoning steps, each consuming hundreds to thousands of tokens.
At Ollama's default 4,096-token window, Hermes fills its context after the first 3-4 tool calls and starts losing earlier steps. The agent then repeats work it already completed, skips steps, or produces outputs that contradict earlier findings. 64K is the tested minimum for reliable operation. For complex multi-hour tasks, 128K performs noticeably better. Mistral Small's 128K context makes it a strong choice on machines with 10+ GB RAM.
Which Ollama model works best with Hermes Agent?
Qwen3 8B is the best starting point for machines with 8-12 GB RAM. It has reliable tool-calling support, handles multi-step instructions, and fits in 8 GB RAM at Q4 quantization. Ollama pulls it in under 10 minutes on a standard connection.
For better performance on complex tasks: Qwen3.5 27B (needs 16 GB RAM) offers the best quality among free local models as of mid-2026. Llama 4 Maverick (needs 24 GB RAM) provides the strongest tool-calling accuracy among all open-source models, with a 1M-token context window that removes the 64K constraint entirely.
Avoid models smaller than 7B parameters. Phi-3 Mini (3.8B), Gemma 2B, and similar compact models fail at the structured JSON tool-call format that Hermes depends on for every task it executes.
Is Hermes Agent the same as OpenClaw?
No. Hermes Agent and OpenClaw are separate projects. Both are autonomous AI agents with persistent memory files, messaging gateways, and skill systems, but they were built by different teams with different priorities.
OpenClaw gained 113,000 GitHub stars in January 2026 from a viral moment. It provides a web dashboard UI and a large third-party skill ecosystem (molthub) that Hermes lacks.
Hermes Agent by Nous Research focuses more on developer and technical workflows: it supports Docker, SSH, and cloud sandbox terminal backends, has more granular approval controls, and is generally considered more stable for multi-step coding and file-system tasks. The Hermes gateway also supports more platforms (Signal, Email) out of the box.
For personal automation with a visual interface, OpenClaw is easier to get started with. For technical agent workflows and developer tasks, Hermes is the better fit.
How do I update Hermes Agent to the latest version?
For pip installs:
pip install --upgrade hermes-agentFor curl installs, re-run the original installer:
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bashRe-running the installer updates to the latest version. Your `~/.hermes/` directory (config.yaml, SOUL.md, the memories folder, and all skills) is never modified by an update. Installed skills may need a version bump after major releases; check the Hermes changelog before updating if you have custom skills.
Can Hermes Agent run 24/7 without my laptop?
Not on a laptop that sleeps or gets closed. For 24/7 operation, you need a server or VPS running continuously.
On a VPS, run the Hermes gateway as a background process with tmux:
tmux new-session -d -s hermes 'hermes gateway run'Or as a systemd service on Ubuntu:
sudo systemctl enable --now hermes-gatewayA Contabo Cloud VPS 30 at âŦ16.95/month (24 GB RAM) handles Hermes plus Ollama running Qwen3 8B or Qwen3.5 27B continuously. The agent stays connected to Telegram or Discord at all times, processes tasks while you are away, and sends you results through the messaging gateway.
How much does running Hermes Agent with Ollama cost?
With Ollama, the per-message cost is $0. You pay only for the hardware.
On your own machine, the cost is zero beyond electricity. On a VPS, a Contabo Cloud VPS 30 at âŦ16.95/month provides 24 GB RAM for Hermes plus a 7B-27B model running 24/7.
Compare to cloud APIs: GPT-4o charges $0.005-$0.015 per 1,000 tokens. An active Hermes user running 30-50 agent tasks per day, each consuming 5,000-15,000 tokens across tool calls, spends $1.50-$10.00 per day in API costs, or $45-$300 per month. The VPS cost of âŦ16.95/month pays for itself within the first 3-6 active usage days.
What is SOUL.md and do I need to edit it manually?
SOUL.md (`~/.hermes/SOUL.md`) is a plain Markdown file that serves as Hermes's primary identity context. Loaded at every startup as the first slot in the system prompt, it tells the agent who you are, your timezone, active projects, working preferences, and any standing instructions.
You do not need to edit it manually after the initial setup. Hermes writes to it automatically as it learns your patterns. But adding your name, timezone, and key preferences at setup time significantly improves output quality from the first session rather than waiting for the agent to infer your context over several interactions.
The SOUL.md approach is similar to OpenClaw's soul.md system, with one key difference: Hermes stores it globally in `~/.hermes/` so it carries across all projects and directories automatically, rather than being project-specific.
Related Guides
How to Run Ollama Locally: Complete Setup Guide (2026)
How to Run OpenClaw with Ollama Local Models (2026 Guide)
How to Set Up AnythingLLM with Ollama (2026 Guide)
How to Set Up Open-WebUI with Ollama (Docker Guide)
Best Local LLM Models to Run in 2026 (Benchmarks + Use Cases)