AI AgentsIntermediate30 min to complete16 min read

How to Install Hermes Agent with Ollama Local Models (2026)

Q: Can I run Hermes Agent on a laptop?

Yes, with 16 GB RAM use Qwen3 8B (8 GB for model). With 8 GB total RAM the machine runs at its memory limit. CPU inference drains a laptop battery in 2-3 hours of active use. A VPS is better for sustained or 24/7 agent work.

Q: Why does Hermes Agent need 64,000 tokens of context?

Hermes runs multi-step tasks where every tool call output stays in context. At Ollama's 4,096-token default, context fills after 3-4 tool calls, causing repeated work and errors. 64K is the tested minimum. 128K (Mistral Small) performs better on complex tasks.

Q: Which Ollama model works best with Hermes Agent?

Qwen3 8B is best for 8-12 GB RAM: reliable tool-calling, fits 8 GB. For complex tasks: Qwen3.5 27B (16 GB RAM) or Llama 4 Maverick (24 GB RAM, best tool accuracy). Avoid models smaller than 7B. They fail at the structured JSON tool calls Hermes requires.

Q: Is Hermes Agent the same as OpenClaw?

No. Both are autonomous agents with similar features (persistent memory, messaging gateway, skills) but built by different teams. OpenClaw has a web UI and large skill marketplace. Hermes Agent by Nous Research has better terminal backends (Docker/SSH) and is more stable for technical developer workflows.

Q: How do I update Hermes Agent to the latest version?

Run `pip install --upgrade hermes-agent` or re-run the curl installer script. Your ~/.hermes/ directory (config, SOUL.md, memories, skills) is never touched by updates.

Q: Can Hermes Agent run 24/7 without my laptop?

Not on a sleeping laptop. For 24/7 operation, use a VPS. Run: `tmux new-session -d -s hermes "hermes gateway run"`. A Contabo VPS 30 (24 GB RAM, €16.95/month) handles Hermes + Ollama Qwen3 8B continuously with Telegram/Discord gateway always active.

Q: How much does running Hermes Agent with Ollama cost?

Ollama: $0 per message, pay only for hardware. Contabo VPS 30 is €16.95/month (24 GB RAM) for 24/7 operation. Cloud APIs: GPT-5.4 costs $0.005-$0.015 per 1K tokens, ~$45-$300/month for active agent use. Ollama pays back the VPS cost within 3-6 active days.

Q: What is SOUL.md and do I need to edit it manually?

SOUL.md (~/.hermes/SOUL.md) is a Markdown file loaded as Hermes's identity context at every startup. Hermes writes to it automatically, but manually adding your name, timezone, and preferences at setup improves output quality from day one instead of waiting for auto-learning.

Connect Hermes Agent by Nous Research to Ollama local models. Step-by-step install, 64K context fix, SOUL.md setup, and model selection for private AI automation.

By Amara|Updated 5 July 2026

Hermes Agent is an open-source autonomous AI agent from Nous Research, released under the MIT license (current version: v0.14.0). Unlike chatbots that respond to one message at a time, Hermes runs continuously, executes multi-step tasks across your filesystem and the web, connects to messaging platforms like Telegram and Discord, and learns from every session through a persistent memory system stored in `~/.hermes/`.

By default Hermes connects to cloud AI providers such as Anthropic and OpenAI. This guide replaces those cloud endpoints with Ollama, giving you an autonomous agent that runs entirely on your own hardware with zero per-message costs, no rate limits, and no data leaving your machine.

One critical detail that catches most people in the Hermes + Ollama setup: Ollama defaults to a 4,096-token context window, which is far too small for Hermes to function. The agent requires at least 64,000 tokens to maintain its working memory across multi-step tool-calling sequences. This guide addresses that fix first, before any other configuration. By the end you will have Hermes Agent running against a local Qwen3 or Llama 4 model, with persistent memory in SOUL.md, terminal and web tool access, and optionally a Telegram or Discord gateway for remote agent control.

Prerequisites

Linux (Ubuntu 22.04+), macOS 12+, or Windows 10/11 with WSL2 enabled
Minimum 16 GB RAM for 7B-8B models; 24 GB RAM for 13B-14B models
Python 3.11 or higher (the Hermes installer manages this via uv if not already present)
Git installed (the only manually required dependency)
10-20 GB free disk space for Ollama model files
Ollama installed and running (the next section covers this if you have not set it up yet)
(Optional) A VPS for 24/7 agent operation without your local machine staying on

🖥️

Need a VPS?

Run this on a Contabo Cloud VPS 30 starting at €16.95/mo. Reliable Linux VPS with NVMe storage, ideal for self-hosted AI workloads.

In This Guide

1What Hermes Agent Does and Why Local Models Make Sense
2Install Ollama and Configure the 64K Context Window
3Install Hermes Agent
4Configure Hermes Agent to Use Ollama
5First Run and Verification
6Configure SOUL.md: The Agent Identity File
7Connect Hermes to Telegram or Discord (Optional)
8Configuration Reference
9Troubleshooting
10FAQ

What Hermes Agent Does and Why Local Models Make Sense

Hermes Agent (GitHub: NousResearch/hermes-agent, MIT license) is an autonomous AI agent that runs as a persistent process on your machine. Instead of answering one question at a time, it executes multi-step tasks: browsing the web, running terminal commands, reading and writing files, sending messages through connected platforms, and scheduling recurring automations in natural language.

The core architectural components:

SOUL.md: A Markdown file at `~/.hermes/SOUL.md` that stores your identity and preferences. Loaded at every session startup as the primary agent context.
memories/ directory: Fact-based memories at `~/.hermes/memories/` extracted automatically from sessions and retrieved by relevance.
skills/ directory: Step-by-step procedures the agent writes for itself and reuses across sessions.
Gateway: Connections to Telegram, Discord, Slack, WhatsApp, Signal, and Email for remote task submission.

Why replace the cloud API with Ollama:

Factor	Cloud API (Claude/GPT-5.4)	Ollama Local Models
Cost per message	$0.003-$0.015	$0
Data privacy	Processed by third-party servers	Stays on your hardware
Rate limits	Yes (varies by plan)	None
Context window	Up to 200K (Claude)	64K+ (configurable)
Reasoning quality	High	Medium-High (Qwen3, Llama 4)
Setup complexity	API key only	This guide

For tasks involving sensitive data or sustained personal automation, local Ollama inference costs nothing beyond electricity and keeps every token on your own hardware. A well-configured 27B model like Qwen3.5 handles most agent workloads that previously required GPT-5.4.

ℹ️

Note:Hermes Agent connects to Ollama through an OpenAI-compatible API layer. Ollama exposes this at `http://localhost:11434/v1`, accepting the same request format as the OpenAI API. Hermes sends all inference requests to this endpoint rather than to api.openai.com.

Compared to OpenClaw: OpenClaw has a web dashboard UI and a larger third-party skill marketplace (molthub). Hermes supports Docker, SSH, and cloud sandbox terminal backends, which makes it more predictable for technical developer workflows where the execution environment matters.

Install Ollama and Configure the 64K Context Window

Most Hermes + Ollama setups fail at the context window. Ollama defaults to 4,096 tokens. Hermes requires 64,000. The gap makes the agent behave erratically after the first few tool calls, and the symptom looks like a model quality problem rather than a configuration one. Fix this before anything else.

Install Ollama

# Linux and macOS — one-command installer
curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

ollama --version
# Expected: ollama version 0.6.x

On Linux, Ollama installs as a systemd service and starts automatically. Confirm it is running:

systemctl status ollama
# Expected: active (running)

Choose a Model for Hermes Agent

Hermes Agent's tool-calling and multi-step reasoning require a model with reliable function-calling support and a minimum 64K context window. These models work well:

Model	RAM Required	Pull Command	Tool Calling	Best For
Qwen3 8B	8 GB	`ollama pull qwen3:8b`	Reliable	General tasks, 8-12 GB machines
Qwen3.5 27B	16 GB	`ollama pull qwen3.5:27b`	Strong	Best free local model for Hermes
Mistral Small	10 GB	`ollama pull mistral-small`	Solid	128K context, fast responses
Llama 4 Maverick	24 GB	`ollama pull llama4:maverick`	Best-in-class	Complex multi-step agent tasks

Pull your chosen model. Qwen3 8B is the recommended starting point for most machines:

ollama pull qwen3:8b

# Expected output:
# pulling manifest
# pulling e...  100% ████████ 5.2 GB / 5.2 GB
# success

Also check the best local LLM models guide for a full hardware-to-model matching table.

Set the Context Window to 64K (Required)

Hermes Agent requires at least 64,000 tokens of context. Ollama defaults to 4,096. You must change this before running Hermes, or it will behave erratically after the first 3-4 tool calls.

Choose one of these three approaches:

Option 1: Environment variable (simplest, current terminal session)

OLLAMA_CONTEXT_LENGTH=64000 ollama serve

Option 2: Systemd service override (persists across reboots on Linux)

# Open the systemd override editor
sudo systemctl edit ollama.service

In the editor that opens, add:

ini

[Service]
Environment="OLLAMA_CONTEXT_LENGTH=64000"

Save and reload:

sudo systemctl daemon-reload && sudo systemctl restart ollama

Option 3: Per-model Modelfile (most explicit, controls a single named model)

# Create a Modelfile setting 64K context for qwen3:8b
cat > Modelfile << 'EOF'
FROM qwen3:8b
PARAMETER num_ctx 64000
EOF

# Build the modified model under a new name
ollama create qwen3-8b-64k -f Modelfile

# Expected output:
# transferring model data
# using existing layer sha256...
# creating new layer sha256...
# writing manifest
# success

Verify the context length is active:

ollama run qwen3:8b --num_ctx 64000 "reply with just: OK"
ollama ps
# Check the CONTEXT column — should show 64000

⚠️

Warning:If `ollama ps` still shows CONTEXT as 4096 after applying Option 1 or 2, the environment variable did not take effect in the running service. Use Option 3 (the Modelfile method) as a reliable fallback. It bakes the context setting into the model variant you reference by name.

Install Hermes Agent

Hermes Agent installs via a one-command script that handles all dependencies: Python 3.11 (via uv), Node.js v22, ripgrep, ffmpeg, and Playwright for browser automation. Git is the only prerequisite you need to install manually.

Linux, macOS, and Windows WSL2

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

The script takes 2-5 minutes on a 100 Mbps connection. After it finishes, reload your shell:

# Linux (bash)
source ~/.bashrc

# macOS (zsh)
source ~/.zshrc

Verify the installation:

hermes --version
# Expected: hermes-agent X.X.X

Alternative: pip install

pip install hermes-agent

# Install optional system dependencies (Node.js, ripgrep, ffmpeg, Playwright)
hermes postinstall

Windows Native PowerShell (Early Beta)

powershell

iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)

The Windows installer downloads a portable Git distribution and installs Hermes to `%LOCALAPPDATA%\hermes\hermes-agent`. No admin rights required. This path is in early beta as of v0.14.0. WSL2 provides a more stable experience on Windows.

Run the Diagnostic Check

After installation, confirm all components are present:

hermes doctor

# Expected output:
# ✓ Python 3.11.x
# ✓ Node.js v22.x.x
# ✓ ripgrep x.x.x
# ✓ ffmpeg x.x.x
# ✓ playwright chromium
# ✓ hermes-agent x.x.x

Any failed checks appear with an X and a suggested fix command. Run `hermes postinstall` if any optional dependencies are missing.

ℹ️

Note:Playwright (browser automation) is required for any task involving web browsing. If `hermes doctor` shows it missing, run: `pip install playwright && playwright install chromium`. This adds about 400 MB for the Chromium binary but enables the full web automation capability.

Configure Hermes Agent to Use Ollama

With Ollama running at 64K context and Hermes installed, point Hermes at the local endpoint.

Interactive Setup (Recommended)

hermes model

The interactive menu asks for your provider. Select:

1. "Custom endpoint (self-hosted / VLLM / etc.)" 2. Enter URL: `http://localhost:11434/v1` 3. API key: press Enter to skip (Ollama requires no key) 4. Model name: type the exact name shown in `ollama list`, for example `qwen3:8b` or `qwen3-8b-64k` if you used the Modelfile method

Hermes writes these values to `~/.hermes/config.yaml` automatically.

Manual Configuration via config.yaml

Open the config file directly:

hermes config edit

Set these values under the `model` key:

yaml

model:
  default: qwen3:8b
  provider: custom
  base_url: http://localhost:11434/v1
  context_length: 64000

ℹ️

Note:[!IMPORTANT] Set `context_length` explicitly in config.yaml to match your Ollama `num_ctx` setting. Without this, Hermes queries Ollama's `/api/show` endpoint, which reports the model's theoretical maximum context rather than the effective `num_ctx` you configured. Mismatched values cause context overflow errors mid-session on longer agent tasks.

Windows WSL2: Fix the Localhost Issue

If Ollama runs on your Windows host and Hermes runs inside WSL2, `localhost:11434` inside WSL2 refers to the WSL2 container, not your Windows machine. Two steps to fix this:

Step 1: Tell Ollama on Windows to listen on all interfaces

powershell

# Run in Windows PowerShell before starting Ollama
$env:OLLAMA_HOST = "0.0.0.0"
ollama serve

Step 2: Find your Windows host IP from inside WSL2

# Inside your WSL2 terminal
cat /etc/resolv.conf | grep nameserver
# The IP shown (e.g., 172.29.192.1) is your Windows host IP

Step 3: Use that IP as the base URL in config.yaml

yaml

model:
  base_url: http://172.29.192.1:11434/v1

This is the same networking issue that affects OpenClaw and AnythingLLM Docker setups. Any tool running in an isolated environment needs the actual host IP rather than `localhost`.

First Run and Verification

Start Hermes Agent:

# Standard CLI interface
hermes

# Modern TUI interface (recommended — shows tool activity in real time)
hermes --tui

The startup banner confirms the model and context configuration:

  _    _
 | |  | |  ___ _ __ _ __ ___   ___  ___
 | |__| |/ _ \ '__| '_ ` _ \ / _ \/ __|
 |  __  |  __/ |  | | | | | |  __/\__ \
 |_|  |_|\___|_|  |_| |_| |_|\___||___/

Model: qwen3:8b (custom endpoint)
Context: 64000 tokens
Tools: terminal, browser, file, web_search

Test with a task that uses the terminal tool:

List the 5 largest files in my home directory by size

Hermes calls the terminal tool to run `find ~ -maxdepth 1 -type f -printf '%s %f\n' | sort -rn | head -5`, shows the command for approval (in manual mode), and returns the results. If you see a tool call followed by real output, the Ollama connection is working.

Test a multi-step task to verify the 64K context holds:

Search the web for the latest Qwen3 benchmark results, summarise the findings in 3 bullet points, and save them to a file called qwen3-notes.txt in my home directory

This task requires web browsing, reasoning, and file writing across multiple turns. If Hermes completes it without a context overflow error, the 64,000-token configuration is correct.

💡

Tip:In manual approval mode (the default), Hermes shows each terminal command and asks "Approve?" before executing. Type `y` to approve, `n` to skip. Switch to automatic approval for common tasks with `hermes config set approvals.mode smart`. It uses heuristics to auto-approve safe operations while still prompting for destructive ones like file deletion.

Manage Long Sessions with /compress

Hermes auto-compresses the context when it reaches 50% of the 64K limit (around 32,000 tokens). Trigger it manually in long sessions:

/compress

This summarises the session history into a compact representation while keeping the last 20 messages intact. On a 64K context window with Qwen3 8B, a multi-hour coding or research session stays within limits with one or two manual compress calls.

Configure SOUL.md: The Agent Identity File

SOUL.md is stored at `~/.hermes/SOUL.md`. Hermes reads it at every startup as its primary identity context, the first slot in the system prompt. It tells the agent who you are, your working preferences, timezone, active projects, and any standing instructions.

Open it for editing:

# Open via Hermes command
hermes config edit soul

# Or edit directly
nano ~/.hermes/SOUL.md

Add your context in plain Markdown:

markdown

# Identity

My name is [Your name]. I work as a [role] in [timezone].

## Working Preferences

- Prefer concise output unless asked for detail
- When writing code, use Python 3.11+ syntax and type hints
- Default file format: Markdown
- Primary OS: Ubuntu 22.04

## Active Projects

- [List current projects so Hermes maintains context across sessions]

## Recurring Tasks

- Daily morning brief: weekdays at 9 AM
- Weekly report: Fridays before 4 PM

## Directories and Paths

- Projects: ~/projects/
- Notes: ~/notes/
- Downloads: ~/Downloads/

Hermes also writes to SOUL.md and the separate `~/.hermes/memories/` folder automatically as it learns your patterns across sessions. The memory system maintains two separate stores:

Store	Location	Type	Updated by
SOUL.md	`~/.hermes/SOUL.md`	Identity and preferences	Human-editable + auto-updated
Memories	`~/.hermes/memories/`	Facts extracted from sessions	Automatic only

You do not need to update SOUL.md manually beyond the initial setup. But providing your name, timezone, and key preferences at the start saves several sessions of inference time compared to waiting for the agent to infer your context automatically.

ℹ️

Note:The SOUL.md approach is similar to OpenClaw's soul.md file, but stored globally in `~/.hermes/` rather than per-project. This means your identity context carries across all projects and working directories automatically.

Connect Hermes to Telegram or Discord (Optional)

The Hermes gateway connects the running agent to Telegram, Discord, Slack, WhatsApp, Signal, or Email. You send tasks from your phone without opening a terminal, and the agent can message you with updates or task completions while it works.

Run the gateway setup wizard:

hermes gateway setup

The wizard walks through your chosen platform. For Telegram, you need a bot token from @BotFather.

Set Up a Telegram Gateway

1. Open Telegram and search for @BotFather 2. Send `/newbot`, follow the prompts, and copy the token 3. Run `hermes gateway setup` and select Telegram 4. Paste your token when prompted

Test the gateway in the foreground:

hermes gateway run

Send a message to your bot. Hermes responds using the same local Ollama model. To run the gateway persistently:

# Using tmux (recommended over systemd on WSL2)
tmux new-session -d -s hermes-gateway 'hermes gateway run'

# Check gateway status
tmux attach -t hermes-gateway

On a Linux server with working systemd:

sudo systemctl enable --now hermes-gateway
sudo journalctl -fu hermes-gateway  # view live logs

Check gateway status at any time:

hermes gateway status

ℹ️

Note:On WSL2, systemd support is incomplete. Use tmux or nohup instead of systemd for persistent gateway operation: `nohup hermes gateway run &> ~/.hermes/gateway.log &`. For 24/7 Telegram or Discord access without local machine constraints, a VPS running Hermes is more reliable than a WSL2 setup.

Configuration Reference

All Hermes settings live in `~/.hermes/config.yaml`. Secrets and API keys go in `~/.hermes/.env`. The most relevant settings for an Ollama-backed setup:

Setting	Default	Purpose
`model.default`	(set via `hermes model`)	Model name, must match `ollama list` output exactly
`model.base_url`	(set via `hermes model`)	Ollama endpoint: `http://localhost:11434/v1`
`model.context_length`	auto-detected	Set explicitly to match your Ollama `num_ctx` value
`approvals.mode`	`manual`	`manual` (prompt each time), `smart` (auto-approve safe ops), `off` (fully autonomous)
`terminal.backend`	`local`	Execution environment: `local`, `docker`, `ssh`, or `modal`
`memory.memory_enabled`	`true`	Toggle persistent memory across sessions
`compression.enabled`	`true`	Auto-compress context at the threshold below
`compression.threshold`	`0.50`	Compress when context reaches this fraction of the limit
`delegation.max_concurrent_children`	`3`	Maximum parallel subagents
`display.show_reasoning`	`false`	Show model thinking traces in output
`HERMES_STREAM_READ_TIMEOUT`	`120`	Streaming timeout in seconds (increase for slow local models)

Set values via CLI without opening config.yaml:

# Switch terminal commands to run inside Docker containers
hermes config set terminal.backend docker

# Show model reasoning traces (useful when debugging Qwen3 thinking mode)
hermes config set display.show_reasoning true

# Extend streaming timeout for slower models or CPU-only inference
hermes config set HERMES_STREAM_READ_TIMEOUT 1800

# Enable smart approval mode for less interruption
hermes config set approvals.mode smart

Install and Manage Skills

Hermes ships with 70+ built-in skills (capabilities for file management, web search, Python scripting, image generation, and more). Search and install additional community skills:

# Search available skills
hermes skills search web

# List installed skills
hermes skills list

# Install a skill
hermes skills install web-scraper

Skills are stored in `~/.hermes/skills/` and persist across updates.

Update Hermes Agent

# pip install method
pip install --upgrade hermes-agent

# curl install method — re-run the installer
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Your `~/.hermes/` directory (config.yaml, SOUL.md, memories, and skills) is never modified by an update.

Troubleshooting

`hermes: command not found` after installation

Cause: Shell PATH not updated with the new `~/.local/bin` entry added by the installer

Fix: Run `source ~/.bashrc` (Linux) or `source ~/.zshrc` (macOS). If it still fails, add `export PATH="$HOME/.local/bin:$PATH"` to your shell profile manually, save, and reload.

Hermes connects but gives incoherent or incomplete responses after a few tool calls

Cause: Ollama context window still at 4,096 tokens. Hermes fills the context after the first 3-4 tool call outputs and starts losing earlier steps

Fix: Run `ollama ps` and check the CONTEXT column. If it shows 4096, apply the OLLAMA_CONTEXT_LENGTH=64000 environment variable or create a Modelfile variant as shown in the install section. Then set `model.context_length: 64000` in `~/.hermes/config.yaml` to match.

Tool calls hang indefinitely; Hermes appears frozen mid-task

Cause: Streaming read timeout is too short for local inference, or the model is doing CPU-only inference and generating very slowly

Fix: Add `HERMES_STREAM_READ_TIMEOUT=1800` to `~/.hermes/.env`. Check whether inference is using GPU or CPU with `ollama ps`. GPU inference shows VRAM usage. CPU inference on a 7B model can take 15-30 seconds per turn on an 8-core machine.

WSL2: "Connection refused" when Hermes tries to reach Ollama

Cause: `localhost` inside WSL2 refers to the WSL2 container, not the Windows host machine where Ollama is running

Fix: On Windows, set `$env:OLLAMA_HOST = "0.0.0.0"` before starting Ollama. Find your Windows IP from WSL2 with `cat /etc/resolv.conf | grep nameserver`. Set `model.base_url: http://:11434/v1` in config.yaml.

Model produces wrong tool-call structure or skips steps entirely

Cause: The selected model has weak function-calling reliability at the configured context length

Fix: Switch to Qwen3 8B or Qwen3.5 27B, which have the most reliable tool-calling among local models as of mid-2026. For coding tasks, `qwen2.5-coder:32b` performs better. Update `model.default` in config.yaml and run `hermes model` to confirm the new selection.

`hermes doctor` shows Playwright missing

Cause: Browser automation dependency was not installed during `postinstall`

Fix: Run `hermes postinstall` to retry. If that fails: `pip install playwright && playwright install chromium`. Playwright is required for web browsing tasks but optional if you only need terminal, file, and web search tools.

Alternatives to Consider

Tool	Type	Price	Best For
OpenClaw	Self-hosted	Free (self-hosted)	Users who want an autonomous agent with a web dashboard UI and a large third-party skill marketplace (molthub). OpenClaw gained 113,000+ GitHub stars in January 2026 and has wider community skill coverage. The setup complexity and security considerations are similar to Hermes.
AnythingLLM	Desktop app / Self-hosted	Free	Chat-with-documents (RAG) workflows. AnythingLLM excels at private document Q&A with local models. Less capable as an autonomous agent than Hermes, but far simpler to configure for document-heavy use cases.
Open-WebUI	Self-hosted	Free	Interactive chat and basic RAG with local models. Open-WebUI is a ChatGPT-style frontend, not an autonomous agent. Better when you want to stay in control of every step rather than delegating multi-step tasks.
Contabo Cloud VPS 30	Cloud (self-managed)	From €16.95/month	Running Hermes Agent 24/7 on a cloud server without local hardware requirements. The VPS stays connected to Telegram and Discord even when your laptop is off, and 24 GB RAM handles Hermes alongside Qwen3.5 27B continuously.

Frequently Asked Questions

Can I run Hermes Agent on a laptop?

Yes. A laptop with 16 GB RAM handles Qwen3 8B (needs 8 GB for the model) alongside Hermes Agent without issues. With 8 GB total RAM, the machine runs at its memory ceiling. Hermes itself uses around 300-500 MB for its Node.js and Python processes, leaving barely enough headroom for Ollama to load a 7B model.

The practical constraint on a laptop is sustained CPU load and battery drain during inference. CPU-only inference on a 7B model discharges a laptop battery in 2-3 hours under normal agent usage. For sustained or 24/7 agent work, a VPS or desktop machine is more practical than a laptop.

For laptop users who want Hermes always accessible, a Contabo Cloud VPS 30 at €16.95/month runs Hermes as a background service and keeps the Telegram gateway connected without any local machine running.

Why does Hermes Agent need 64,000 tokens of context?

Hermes orchestrates multi-step tasks where each tool call and its output stays in the context window so the model can reason across the full sequence. A single agentic task might involve 10-30 tool calls: web searches, terminal commands, file reads, and intermediate reasoning steps, each consuming hundreds to thousands of tokens.

At Ollama's default 4,096-token window, Hermes fills its context after the first 3-4 tool calls and starts losing earlier steps. The agent then repeats work it already completed, skips steps, or produces outputs that contradict earlier findings. 64K is the tested minimum for reliable operation. For complex multi-hour tasks, 128K performs noticeably better. Mistral Small's 128K context makes it a strong choice on machines with 10+ GB RAM.

Which Ollama model works best with Hermes Agent?

Qwen3 8B is the best starting point for machines with 8-12 GB RAM. It has reliable tool-calling support, handles multi-step instructions, and fits in 8 GB RAM at Q4 quantization. Ollama pulls it in under 10 minutes on a standard connection.

For better performance on complex tasks: Qwen3.5 27B (needs 16 GB RAM) offers the best quality among free local models as of mid-2026. Llama 4 Maverick (needs 24 GB RAM) provides the strongest tool-calling accuracy among all open-source models, with a 1M-token context window that removes the 64K constraint entirely.

Avoid models smaller than 7B parameters. Phi-3 Mini (3.8B), Gemma 2B, and similar compact models fail at the structured JSON tool-call format that Hermes depends on for every task it executes.

Is Hermes Agent the same as OpenClaw?

No. Hermes Agent and OpenClaw are separate projects. Both are autonomous AI agents with persistent memory files, messaging gateways, and skill systems, but they were built by different teams with different priorities.

OpenClaw gained 113,000 GitHub stars in January 2026 from a viral moment. It provides a web dashboard UI and a large third-party skill ecosystem (molthub) that Hermes lacks.

Hermes Agent by Nous Research focuses more on developer and technical workflows: it supports Docker, SSH, and cloud sandbox terminal backends, has more granular approval controls, and is generally considered more stable for multi-step coding and file-system tasks. The Hermes gateway also supports more platforms (Signal, Email) out of the box.

For personal automation with a visual interface, OpenClaw is easier to get started with. For technical agent workflows and developer tasks, Hermes is the better fit.

How do I update Hermes Agent to the latest version?

For pip installs:

pip install --upgrade hermes-agent

For curl installs, re-run the original installer:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Re-running the installer updates to the latest version. Your `~/.hermes/` directory (config.yaml, SOUL.md, the memories folder, and all skills) is never modified by an update. Installed skills may need a version bump after major releases; check the Hermes changelog before updating if you have custom skills.

Can Hermes Agent run 24/7 without my laptop?

Not on a laptop that sleeps or gets closed. For 24/7 operation, you need a server or VPS running continuously.

On a VPS, run the Hermes gateway as a background process with tmux:

tmux new-session -d -s hermes 'hermes gateway run'

Or as a systemd service on Ubuntu:

sudo systemctl enable --now hermes-gateway

A Contabo Cloud VPS 30 at €16.95/month (24 GB RAM) handles Hermes plus Ollama running Qwen3 8B or Qwen3.5 27B continuously. The agent stays connected to Telegram or Discord at all times, processes tasks while you are away, and sends you results through the messaging gateway.

How much does running Hermes Agent with Ollama cost?

With Ollama, the per-message cost is $0. You pay only for the hardware.

On your own machine, the cost is zero beyond electricity. On a VPS, a Contabo Cloud VPS 30 at €16.95/month provides 24 GB RAM for Hermes plus a 7B-27B model running 24/7.

Compare to cloud APIs: GPT-5.4 charges $0.005-$0.015 per 1,000 tokens. An active Hermes user running 30-50 agent tasks per day, each consuming 5,000-15,000 tokens across tool calls, spends $1.50-$10.00 per day in API costs, or $45-$300 per month. The VPS cost of €16.95/month pays for itself within the first 3-6 active usage days.

What is SOUL.md and do I need to edit it manually?

SOUL.md (`~/.hermes/SOUL.md`) is a plain Markdown file that serves as Hermes's primary identity context. Loaded at every startup as the first slot in the system prompt, it tells the agent who you are, your timezone, active projects, working preferences, and any standing instructions.

You do not need to edit it manually after the initial setup. Hermes writes to it automatically as it learns your patterns. But adding your name, timezone, and key preferences at setup time significantly improves output quality from the first session rather than waiting for the agent to infer your context over several interactions.

The SOUL.md approach is similar to OpenClaw's soul.md system, with one key difference: Hermes stores it globally in `~/.hermes/` so it carries across all projects and directories automatically, rather than being project-specific.

Related Guides

Beginner20 min

How to Run Ollama Locally: Complete Setup Guide (2026)

Intermediate35 min

How to Run OpenClaw with Ollama Local Models (2026 Guide)

Intermediate25 min

How to Set Up AnythingLLM with Ollama (2026 Guide)

Beginner10 min

Best Local LLM Models to Run in 2026 (Benchmarks + Use Cases)

Beginner15 min

How to Run Kimi K2 on Ollama: Cloud Setup Guide (2026)