AI AgentsIntermediate30 min to complete15 min read

How to Install Hermes Agent with LM Studio (2026)

Q: Is Hermes Agent free to use with LM Studio?

Yes. Hermes Agent is open source (MIT license) with no per-message charge. LM Studio is a free desktop app for personal use. All inference runs on your own hardware. Cost is zero beyond electricity unless you add an optional cloud provider to Hermes.

Q: Does Hermes Agent work on Windows without WSL2?

Yes. Hermes Agent installs via a native PowerShell script and LM Studio has a native Windows app. No WSL2 is needed. This is the main practical advantage of the LM Studio backend over Ollama, which typically requires WSL2 on Windows.

Q: What LM Studio version do I need for Hermes Agent?

LM Studio 0.3.6 or later is required. Earlier builds do not handle function-call schemas reliably, causing Hermes to stall on tool-calling tasks. Check your version under Help and update from lmstudio.ai/download if needed.

Q: How much RAM do I need for Hermes Agent with LM Studio?

Hermes itself needs ~300-500 MB. RAM is determined by model size and context: Gemma 4 E4B Instruct needs ~6 GB at Q4 with 64K context, Qwen3 8B needs ~8 GB, Qwen3.5 27B needs ~18 GB. Add 2-3 GB OS overhead. 16 GB total handles the 7B-8B class comfortably.

Q: Which models work best with Hermes Agent in LM Studio?

Best choices as of mid-2026: Gemma 4 E4B Instruct (8-12 GB machines), Qwen3 8B Instruct (best tool-calling at 7B-8B size), Qwen3.5 27B Instruct (32+ GB RAM, best overall). GLM 4.7 Flash for fast inference. Always choose the Instruct/Chat variant, not the base model.

Q: What is the difference between Hermes Agent with LM Studio vs Ollama?

Inference quality and capabilities are the same. LM Studio is better for Windows without WSL2, GUI model management, and Apple Silicon MLX. Ollama is better for Linux CLI workflows, headless servers, and scripted context configuration. Switching only requires updating base_url in config.yaml.

Q: Why does Hermes Agent need 64,000 tokens of context?

Hermes keeps every tool call and its output in context for multi-step reasoning. At 4K tokens (LM Studio default), context fills after 3-4 tool calls, causing repeated work and contradictions. 64K is the tested minimum. Set it to 65536 in LM Studio model settings.

Q: What is SOUL.md and what should I put in it?

SOUL.md (~/.hermes/SOUL.md) is a Markdown file loaded as Hermes's system context at every startup. It stores your identity, preferences, and standing instructions. Hermes updates it automatically, but adding your name, timezone, and primary use case at setup improves output quality from day one.

Set up Hermes Agent with LM Studio as the local inference backend. Covers model loading, 64K context config, provider setup, SOUL.md, and tool-calling models.

By Amara|Updated 2 July 2026

LM Studio running Gemma 4 E4B at 65536 context, with Hermes Agent CLI executing filesystem and web tool calls

Hermes Agent (MIT license, v0.16.0) is an open-source autonomous AI agent that runs as a persistent process on your machine. It executes multi-step tasks across your filesystem and the web, connects to Telegram, Discord, and other messaging platforms, and builds memory between sessions in a local `~/.hermes/` directory. By default it calls Anthropic and OpenAI. This guide replaces those cloud endpoints with LM Studio, running the full agent stack locally at no per-message cost.

LM Studio brings two things the Ollama backend setup cannot match. On Windows, the combination is fully native: LM Studio ships a Windows app, and Hermes installs via a PowerShell one-liner, with no WSL2 required. On every platform, the LM Studio GUI handles model downloads, context length, and server toggle in four clicks rather than through CLI commands and environment variables. The API it exposes is OpenAI-compatible at `http://localhost:1234/v1`, and Hermes has a built-in LM Studio provider option in its setup wizard.

One configuration step trips up most new setups: the context window. LM Studio reads the default from the model's GGUF metadata, which is typically 2,048 or 4,096 tokens. Hermes needs at least 64,000 tokens to hold its working memory across multi-step tool calls. Set this before loading the model. The guide covers that fix in the second section.

Prerequisites

LM Studio 0.3.6 or later installed on Linux, macOS, or Windows (download from lmstudio.ai)
At minimum 8 GB RAM for a 7B-8B model with 64K context; 16-18 GB RAM for Qwen3.5 27B
5-18 GB free disk space depending on model size (Gemma 4 E4B is about 5 GB, Qwen3.5 27B is about 16 GB)
Git installed (only needed if cloning the Hermes repo manually; the one-line installer handles it otherwise)
(Optional) A GPU with 8 GB or more of VRAM for GPU-accelerated inference on 13B models and larger
(Optional) A rented GPU if you want to run larger models such as Qwen3.5 27B without buying hardware

🖥️

Need more GPU power?

Rent a RTX 4090 on Vast.ai from $0.20/hr. On-demand GPU rentals by the hour, useful for running larger models without buying hardware.

In This Guide

1What Hermes Agent Does and Why LM Studio Fits the Setup
2Install LM Studio and Load a Tool-Capable Model
3Install Hermes Agent on Linux, macOS, or Windows
4Connect Hermes Agent to LM Studio
5Configure SOUL.md and Run Your First Hermes Task
6Troubleshooting
7FAQ

What Hermes Agent Does and Why LM Studio Fits the Setup

Hermes Agent (NousResearch/hermes-agent, MIT license) is an autonomous AI agent that runs as a long-lived background process. Unlike chatbots that answer one question at a time, it executes sequences of tool calls: browsing the web, reading and writing files, running terminal commands, and sending messages through connected gateways.

Hermes maintains four components on disk:

`~/.hermes/SOUL.md` stores your identity and preferences. Loaded at every startup as the primary system context.
`~/.hermes/memories/` holds fact-based memories extracted from sessions and retrieved by relevance on each run.
`~/.hermes/skills/` holds step-by-step procedures the agent writes for itself and reuses across sessions.
The gateway layer handles connections to Telegram, Discord, Slack, WhatsApp, Signal, and email for remote task submission.

Hermes uses any OpenAI-compatible API as its inference backend. LM Studio and Ollama both qualify. The choice between them is mostly practical:

Factor	LM Studio	Ollama
API endpoint	`http://localhost:1234/v1`	`http://localhost:11434/v1`
Windows	Native app, no WSL2	WSL2 required for most Windows setups
Model management	GUI browser, one-click download	CLI via `ollama pull`
Context length	Slider in model settings UI	Environment variable or Modelfile
Apple Silicon	MLX builds for faster inference	GGUF with Metal support
Hermes integration	Named provider option in setup wizard	Generic OpenAI-compatible endpoint

For Windows users, LM Studio removes the WSL2 dependency entirely. For anyone who prefers visual model management over a command line, the LM Studio catalog and one-click download flow is faster to set up than Ollama's pull commands. The inference quality and Hermes capabilities are identical once both backends are configured.

ℹ️

Note:Hermes communicates with LM Studio through standard `/v1/chat/completions` and `/v1/models` requests. From Hermes's perspective, LM Studio and any other OpenAI-compatible server are interchangeable once the base URL is set correctly.

Install LM Studio and Load a Tool-Capable Model

LM Studio 0.3.6 or later is required for tool-calling support. Earlier builds expose the chat completions API but do not process function-call schemas reliably, which causes Hermes to stall on tool-dependent tasks.

Step 1: Download and Install LM Studio

Download the installer for your platform from lmstudio.ai. LM Studio ships native builds for macOS (Intel and Apple Silicon), Windows 10/11 (x64), and Linux (AppImage and deb). Run the installer and launch LM Studio.

Confirm the version meets the requirement:

In LM Studio, go to Help and check that the version shown is 0.3.6 or later. If not, download the latest build from lmstudio.ai/download before proceeding.

Step 2: Choose and Download a Model

Hermes requires a model with native tool-calling (function-calling) support and at least 64K context. These models work reliably in LM Studio with Hermes Agent as of mid-2026:

Model	RAM at Q4	Download Size	Tool Calling	Best For
Gemma 4 E4B Instruct	~6 GB	~5 GB	Reliable	Starting out, 8-12 GB machines
Qwen3 8B Instruct	~8 GB	~5.2 GB	Strong	General agent tasks, balanced quality
GLM 4.7 Flash Instruct	~7 GB	~4 GB	Good	Fast responses on 8 GB machines
Qwen3.5 27B Instruct	~18 GB	~16 GB	Excellent	Complex multi-step workflows, 32+ GB RAM
Llama 3.1 8B Instruct	~8 GB	~4.7 GB	Solid	Strong reasoning on 16 GB machines

In LM Studio's search bar, type the model name (for example "Gemma 4 E4B Instruct") and click Download. The download runs in the background. Wait for the status indicator to show 100%.

ℹ️

Note:Always pick the Instruct or Chat variant of a model, not the base version. Base models output raw continuations without following tool-call schemas. A guide titled "Qwen3 8B (base)" in LM Studio will not work for Hermes's tool-calling workflows.

Step 3: Set Context Length to 65,536

This is the step that causes most failed setups. LM Studio's default context for many models is 2,048 or 4,096 tokens. Hermes rejects any model serving fewer than 64,000 tokens at startup.

In LM Studio, click the model you downloaded from the sidebar or model picker. Next to the model name, click the gear icon to open model settings. Find the field labeled "Context Length" and change the value to `65536`. Click Load Model and wait for the status indicator to confirm the model is ready.

If your machine's RAM cannot sustain 64K context at the model size you selected, try a smaller model or reduce to 32,768 tokens. Hermes will still launch, but very long multi-step tasks may hit the limit mid-session.

Step 4: Enable the Local Server

In LM Studio, open the Developer tab (sometimes labeled Local Server in newer builds). Click Start Server. The status panel should show:

Server running on http://localhost:1234

Leave this window open while you use Hermes. LM Studio must be running with the server active for every Hermes session.

Confirm the server is responding before moving on:

curl http://localhost:1234/v1/models

Expected output (abbreviated):

json

{
  "data": [
    { "id": "gemma-4-e4b-instruct", "object": "model" }
  ]
}

If you see the model listed, the server is working. Note the exact model ID string shown in the response; you will paste it into Hermes during setup.

Install Hermes Agent on Linux, macOS, or Windows

Hermes Agent installs with a one-line command on all three platforms. The installer manages the Python 3.11 dependency via `uv`, so you do not need Python pre-installed.

Linux and macOS

curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash

The script installs Hermes, sets up the `~/.hermes/` configuration directory, and adds the `hermes` command to your shell. Close and reopen your terminal, then verify:

hermes --version
# Expected: hermes v0.16.0 (or newer)

Windows (PowerShell — no WSL2 required)

powershell

irm https://hermes-agent.nousresearch.com/install.ps1 | iex

Run this in a PowerShell terminal opened as your regular user (not as Administrator). After the installer finishes, restart the terminal and confirm:

powershell

hermes --version
# Expected: hermes v0.16.0 (or newer)

ℹ️

Note:If PowerShell blocks the script with an execution policy error, run `Set-ExecutionPolicy RemoteSigned -Scope CurrentUser` once to allow remote scripts for your user account, then retry the installer.

Post-Install Dependencies

After the main install, run the post-install step to set up browser automation and other optional tools:

hermes postinstall

This installs Playwright and its Chromium browser, which Hermes uses for web browsing tasks. It is optional but recommended if you plan to use web research in agent tasks. The step takes 2-3 minutes on a standard connection.

Check for any missing components:

hermes doctor
# Expected output shows each component as OK or lists what to install

Connect Hermes Agent to LM Studio

With LM Studio running on port 1234 and a model loaded, run the Hermes setup wizard to connect the two:

hermes setup

The wizard prompts you for a provider. When you see the provider list, select "LM Studio" if it appears as a named option (available in Hermes v0.14+). If it does not appear in your version, choose "Custom OpenAI-compatible endpoint" instead.

For the endpoint URL, enter:

http://localhost:1234/v1

For the API key prompt, leave the field blank and press Enter, or enter any non-empty string such as `lm-studio`. LM Studio does not validate API keys for local connections. Hermes requires the field to be non-empty in some versions, which is why a placeholder value works.

For the model name, paste the exact model ID you noted from the `curl` command in Step 4. For Gemma 4 E4B this typically looks like `gemma-4-e4b-instruct` or a similar string matching what LM Studio returned from `/v1/models`. Using the wrong name results in a "model not found" error on the first request.

For the context length prompt, leave it blank. Hermes will auto-detect the context from LM Studio's server response. This is the safest option because it reads the value you set in LM Studio rather than setting an independent limit that may not match.

Manual Config (Alternative to the Wizard)

If you prefer editing the config file directly, open `~/.hermes/config.yaml` and add:

yaml

model:
  default: lmstudio

providers:
  lmstudio:
    type: openai
    base_url: "http://127.0.0.1:1234/v1"
    api_key: "lm-studio"
    model: "gemma-4-e4b-instruct"

Replace the `model` value with the exact model ID from LM Studio. Save the file. Hermes reads this on every startup, no restart of Hermes itself is needed.

To verify the connection:

hermes ping
# Expected: Connected to lmstudio via http://127.0.0.1:1234/v1 (gemma-4-e4b-instruct, 65536 tokens)

A successful ping confirms that Hermes can reach LM Studio, that the model is loaded, and that the context length meets the 64K minimum.

Configure SOUL.md and Run Your First Hermes Task

SOUL.md (`~/.hermes/SOUL.md`) is Hermes's identity file. Loaded at every startup as the first block of the system prompt, it defines who the agent is, what it knows about you, and how it should behave. Hermes writes to it automatically over time, but adding a few lines at setup time improves output quality from the first session.

Open the file in any text editor:

# Linux / macOS / Windows (WSL or native)
open ~/.hermes/SOUL.md        # macOS
notepad %USERPROFILE%.hermesSOUL.md   # Windows
nano ~/.hermes/SOUL.md        # Linux

A minimal useful SOUL.md for a new setup:

markdown

# Identity
You are Hermes, a local AI agent running via LM Studio on this machine.

# About the user
Name: [your name]
Timezone: [your timezone, e.g. UTC+1]
Primary language: English

# Behavior
- Be concise but technically accurate.
- Use tools when they significantly improve answer quality.
- Ask a clarifying question when the request is ambiguous rather than guessing.
- Prefer Markdown for multi-step instructions and code.

# Capabilities
You can call tools provided by Hermes Agent: web search, filesystem read/write,
terminal commands (with user approval), and messaging gateways.
All inference runs locally via LM Studio. No data leaves this machine unless
a tool explicitly contacts an external service.

Save the file. Hermes will include this on the next run.

First Task

Start Hermes in terminal mode:

hermes

The agent loads SOUL.md, connects to LM Studio, and shows a prompt. Try a simple tool-calling task to confirm the setup works end to end:

>>> List the 5 largest files in my home directory and tell me their sizes.

Hermes should call the filesystem tool, retrieve the results, and summarize them. If you see the tool call execute and a response come back with actual file names and sizes, the setup is complete.

Gateway Setup (Optional)

To control Hermes via Telegram or Discord, run the gateway setup:

hermes gateway setup

The wizard prompts for a bot token. For Telegram: create a bot via @BotFather on Telegram and paste the token. For Discord: create an application in the Discord developer portal and paste the bot token. Once configured, start the gateway:

hermes gateway run

All messages sent to the bot are routed to Hermes, which uses LM Studio as the inference backend. For 24/7 operation, run the gateway in a background `tmux` or `screen` session, or set it up as a systemd service on Linux.

Also see the Hermes Agent with Ollama guide for a comparison of how the gateway behaves with Ollama instead of LM Studio, and the Ollama vs LM Studio comparison for a full feature breakdown between the two local inference backends.

Troubleshooting

Connection refused or "Failed to reach endpoint" when Hermes starts

Cause: LM Studio server is not running or is not bound to port 1234

Fix: Open LM Studio, go to the Developer tab, and click Start Server. Confirm the status shows "running on http://localhost:1234". Run `curl http://localhost:1234/v1/models` to verify before retrying Hermes. Also check that no firewall or security tool on Windows is blocking local port 1234.

"Model not found" or 404 error on first Hermes request

Cause: The model name in Hermes config does not match LM Studio's internal model ID, or no model is loaded in LM Studio

Fix: Run `curl http://localhost:1234/v1/models` and copy the exact "id" value shown in the response. Update the `model` field in `~/.hermes/config.yaml` to that exact string. Also confirm that a model is actively loaded in LM Studio (the status bar should show the model name).

Hermes reports context is too small at startup or rejects the model

Cause: LM Studio context length is below 64,000 tokens

Fix: In LM Studio, click the gear icon next to the loaded model, set Context Length to 65536, and click Load Model to apply the change. Then restart Hermes. The context length set in LM Studio is authoritative; the value shown in `hermes ping` should now show 65536 tokens.

Tool calls fail silently or Hermes generates malformed JSON for tool arguments

Cause: The loaded model does not support function calling, or the Instruct variant was not selected

Fix: In LM Studio, verify the model name contains "Instruct", "Chat", or "it" (Italian convention for instruct) rather than just a base parameter count. Switch to Gemma 4 E4B Instruct, Qwen3 8B Instruct, or Qwen3.5 27B Instruct. These are confirmed to work with Hermes's tool-calling schema as of mid-2026.

API key error or 401 unauthorized response from LM Studio

Cause: Hermes sent an empty API key but a particular LM Studio build requires any non-empty value

Fix: Edit `~/.hermes/config.yaml` and set `api_key: "lm-studio"` under the lmstudio provider. LM Studio does not validate the key content for local connections; any non-empty string resolves this error. Do not add actual credentials here.

First Hermes response is very slow (30-60 seconds or more)

Cause: LM Studio is loading the model into memory on the first inference call, or is filling the KV cache for 64K context on a CPU-only machine

Fix: The delay is one-time per session. After the first response, subsequent requests are significantly faster. For sustained speed improvement: ensure a GPU with 8+ GB VRAM is active in LM Studio (check the Device panel in model settings), or reduce context length to 32768 if CPU inference is the only option.

Alternatives to Consider

Tool	Type	Price	Best For
Hermes Agent with Ollama	CLI / Self-hosted	Free	Linux and macOS users who prefer CLI-based model management, or anyone already running Ollama for other tools. Ollama's context window is set via environment variable (`OLLAMA_CONTEXT_LENGTH=64000`) rather than a GUI slider. See the full guide at /how-to/hermes-agent-ollama-local-models.
OpenClaw	Self-hosted	Free	Users who want a web dashboard UI and access to a large third-party skill marketplace (molthub). OpenClaw gained 113,000+ GitHub stars in January 2026. The setup is similar in complexity to Hermes but emphasizes a visual UI over terminal-first workflows.
AnythingLLM	Desktop app / Self-hosted	Free	Document-heavy workflows (chat with PDFs, codebases, or knowledge bases). AnythingLLM supports LM Studio as a backend and excels at retrieval-augmented generation. It is not an autonomous agent but is much simpler to configure for document Q&A tasks.
Open-WebUI with LM Studio	Self-hosted	Free	Interactive chat rather than autonomous agent workflows. Open-WebUI connects to LM Studio's API endpoint and gives you a ChatGPT-style interface for any locally loaded model. Use this when you want to control every step rather than delegating multi-step tasks.

Frequently Asked Questions

Is Hermes Agent free to use with LM Studio?

Yes. Hermes Agent is open source under the MIT license with no subscription or per-message charge. LM Studio is a free desktop application for personal use. Inference costs come from your own hardware: electricity, and optionally a rented GPU for models that exceed your local VRAM.

The only time you pay per token with this stack is if you add a cloud provider to Hermes alongside LM Studio. The default setup described in this guide keeps every request local and offline.

Does Hermes Agent work on Windows without WSL2?

Yes. Hermes Agent installs via a native PowerShell one-liner and LM Studio ships a native Windows application, so the entire stack runs without WSL2 or a Linux subsystem.

This is the main practical advantage of the LM Studio setup over using Ollama as the backend. Ollama on Windows typically requires WSL2 for reliable operation, which adds setup complexity and resource overhead. With LM Studio, Windows users get the same Hermes capabilities as Linux and macOS users with no additional setup.

What LM Studio version do I need for Hermes Agent?

LM Studio 0.3.6 or later is required. Earlier builds expose the chat completions endpoint but do not handle function-call schemas reliably, which causes Hermes to stall or return empty tool results on any task that requires tool use.

Check your version in LM Studio under Help. If you are on an older build, download the latest from lmstudio.ai/download. The upgrade does not remove downloaded models.

How much RAM do I need for Hermes Agent with LM Studio?

Hermes Agent itself uses around 300-500 MB for its Node.js and Python processes. The RAM requirement comes almost entirely from the model and the context window you configure in LM Studio.

At a 64K context with Q4 quantization: Gemma 4 E4B Instruct needs about 6 GB, Qwen3 8B needs about 8 GB, and Qwen3.5 27B needs about 18 GB. Add 2-3 GB overhead for the operating system and LM Studio itself. A machine with 16 GB total RAM handles Gemma 4 E4B or Qwen3 8B at full context comfortably. For Qwen3.5 27B, you need 32 GB or more.

Which models work best with Hermes Agent in LM Studio?

The three most reliable choices as of mid-2026 are Gemma 4 E4B Instruct (best starting point for 8-12 GB machines, confirmed in multiple Hermes + LM Studio demos), Qwen3 8B Instruct (strongest tool-calling at the 7B-8B size class), and Qwen3.5 27B Instruct (best overall quality for complex multi-step agent tasks on 32+ GB machines).

GLM 4.7 Flash is a good choice for faster inference on 8 GB machines. Avoid base or non-instruct variants of any model. They output raw continuations without following the function-call schemas that Hermes depends on for every agentic task.

What is the difference between Hermes Agent with LM Studio vs Ollama?

The inference quality and agent capabilities are identical once either backend is configured. The differences are in setup and workflow.

LM Studio makes more sense if you are on Windows without WSL2, prefer a GUI for model management, want Apple Silicon MLX acceleration without CLI flags, or are new to local models and the visual interface reduces friction.

Ollama makes more sense if you are on Linux and comfortable with the command line, already use Ollama for other tools, want headless server operation without a desktop app running, or need the `OLLAMA_CONTEXT_LENGTH` environment variable for scripted context configuration across multiple sessions.

Both use the OpenAI-compatible API format. Switching from one to the other only requires updating `base_url` in `~/.hermes/config.yaml`.

Why does Hermes Agent need 64,000 tokens of context?

Hermes orchestrates multi-step tasks where every tool call and its result stays in the context window so the model can reason across the full sequence. A single agent session might span 10-30 tool calls: web searches, file reads, terminal commands, and intermediate reasoning steps, each consuming hundreds to thousands of tokens.

At a 4,096-token default, the context fills after 3-4 tool calls and the model starts losing earlier steps. The agent repeats work, skips steps, or produces outputs that contradict earlier findings. 64K is the tested minimum for reliable operation. For very complex multi-hour tasks, increasing to 128K in LM Studio (if the model and your RAM support it) performs noticeably better.

What is SOUL.md and what should I put in it?

SOUL.md (`~/.hermes/SOUL.md`) is a plain Markdown file that Hermes loads at every startup as the first block of the system prompt. It defines the agent's identity, what it knows about you, and standing instructions for how it should behave.

You do not need to fill it in thoroughly to start. Hermes updates it automatically as it learns patterns from your sessions. Adding your name, timezone, and a short note about your primary use case at setup time improves output quality from the first session rather than waiting several interactions for the agent to infer your context.

Keep it short and factual. A few lines covering your name, timezone, and preferred communication style are more useful than a long list of rules the agent has to parse on every token.

Can I use Hermes Agent with LM Studio on a Mac with Apple Silicon?

Yes. LM Studio 0.3.x supports Apple Silicon with MLX builds for compatible models. In the LM Studio catalog, look for model variants tagged `-mlx` or with "Apple Silicon" in the description. MLX models run faster on M1, M2, M3, and M4 chips than the standard GGUF builds because they use Apple's MLX framework instead of generic CPU or Metal inference.

Note that MLX variants may not support all model sizes and may not expose the exact same context window options as GGUF builds. If you hit issues with an MLX variant, switch to the standard GGUF version of the same model and confirm the context length setting before restarting Hermes.

How do I set up Telegram or Discord with Hermes Agent?

Run `hermes gateway setup` after completing the LM Studio connection. The wizard asks which platform you want to connect (Telegram, Discord, Slack, WhatsApp, Signal, or email) and prompts for a bot token.

For Telegram: create a bot via the @BotFather account on Telegram, copy the token it gives you, and paste it into the gateway setup prompt.

For Discord: create an application in the Discord developer portal, create a bot user under that application, copy the bot token, and paste it. Add the bot to your server with the permissions it needs for reading and sending messages.

Once configured, start the gateway with `hermes gateway run`. Messages sent to your bot are forwarded to Hermes, which uses LM Studio as the backend for all responses. The gateway choice has no effect on which model runs inference.

Can I switch between models in LM Studio without reconfiguring Hermes?

Yes, with one small step. When you switch the loaded model in LM Studio, the new model ID may be different from what you set in `~/.hermes/config.yaml`. Run `curl http://localhost:1234/v1/models` to get the exact ID of the newly loaded model, then update the `model` field in config.yaml.

If you use the Hermes setup wizard's LM Studio provider (rather than a manual config), Hermes may auto-detect the loaded model from the `/v1/models` endpoint on each startup. Check `hermes ping` after switching models to confirm which model Hermes is using before starting a task.

What is the correct API endpoint URL for LM Studio in Hermes Agent?

The correct base URL is `http://localhost:1234/v1` (or `http://127.0.0.1:1234/v1`, which is equivalent). Include the `/v1` path suffix. Hermes appends `/chat/completions` to this base for inference requests.

A common mistake is using `http://localhost:1234` without the `/v1` suffix, which results in 404 errors because LM Studio only exposes its API under the `/v1` path prefix. Another common mistake is using port `11434`, which is Ollama's default port. LM Studio uses `1234` unless you changed it in LM Studio's server settings.

Related Guides

Intermediate30 min

How to Install Hermes Agent with Ollama Local Models (2026)

Beginner8 min

Ollama vs LM Studio: Which Local LLM Tool Should You Use in 2026?

Intermediate35 min

How to Run OpenClaw with Ollama Local Models (2026 Guide)

Beginner10 min

Best Local LLM Models to Run in 2026 (Benchmarks + Use Cases)

Intermediate25 min

How to Set Up AnythingLLM with Ollama (2026 Guide)