Run OpenCode locally with QVAC: a coding agent with no cloud

What OpenCode is

OpenCode is an open-source coding agent. You give it a goal in plain language, and it plans, reads and writes files, runs commands, and reports back. It lives in your terminal or in a browser interface, and it works the way a teammate would: open the repo, make the change, run it, fix what broke.

The detail that matters for this guide is how OpenCode talks to a model. It speaks to any endpoint that exposes an OpenAI-compatible API. That means it is not tied to a single cloud vendor. You point it at whatever model endpoint you want, including one running on your own machine.

That is the whole idea here. OpenCode for the agent loop, a local model for the intelligence, both on hardware you control.

Why local is the right default for a coding agent

When a model only chats, where it runs is mostly a question of privacy. When a model drives a coding agent that reads your repository and runs commands, it becomes a question of control. The agent acts on your machine. If the model behind it lives in someone else’s datacenter, every file it reads and every command it runs depends on a system you do not own: its uptime, its latency, its terms, and its access policy, any of which can change without notice.

Running the model locally changes the property, not just the policy:

Your code stays on the machine. The files the agent reads, the code it writes, the commands it runs: none of it leaves your disk to reach the model.
It works offline. Once the model is downloaded, you can disconnect entirely and keep working. Turn off the network and the agent still answers.
No per-token bill. A long coding session that would run up a metered API costs nothing locally.
No silent model swaps. The weights are a file on your disk. They do not change under you between sessions.

For a coding agent, that adds up to a simple property: the work, and the intelligence doing it, are both yours.

QVAC, and the tools you already use

QVAC is the piece that makes the local half practical. It is an open-source SDK and CLI from Tether that runs models on your own device across every major GPU backend (NVIDIA, AMD, and Intel through Vulkan, and Apple Silicon through Metal), and it ships an OpenAI-compatible server. That server is the bridge a coding agent plugs into.

It is also not a one-tool trick. QVAC is a first-class local provider for OpenCode and OpenClaw, and because the server speaks the standard OpenAI API, it works with the other coding tools developers already use, including Cline, Aider, Continue, and Roo. For OpenCode specifically there is a dedicated plugin, @qvac/opencode-plugin, that wires the whole thing up from one line of config. You bring your own model, running locally, to the agent you already like. QVAC is Apache 2.0, with no API keys, no rate limits, and no per-token cost.

What it looks like

Running Qwen3.6-35B-A3B locally through QVAC (downloaded once, then with the network off), OpenCode handled three ordinary developer tasks back to back:

Build from a prompt. From one sentence, it wrote a self-contained page animating eight hundred glowing particles that drift and react to the cursor, then opened it in the browser. No libraries, every line generated on the machine.
Explain an unfamiliar repo. Asked what a repository does, it read the code and explained it: a solar-system simulation with planets orbiting on circular paths. Opening the page confirmed it.
Find and fix a bug. Handed a shopping cart whose total rendered as garbled text instead of a number, it located the cause, edited the file, and the total added up correctly.

All of it ran locally, on an open and free stack.

Which model to run

Coding agent work (reading a repo, generating a file, fixing a bug) asks more of a model than a chat does, so it pays to run a capable one. You select a model by id in the plugin config, and that id flows straight through to OpenCode’s model picker.

That id is a QVAC model reference, not a path to a file you supply. On first run, QVAC downloads that model once into its own local store and serves it locally from then on, with no network. Because it pulls QVAC’s own copy, a .gguf you may already have from another tool is not reused.

Use case	Model	RAM	Disk	Config id
Recommended default, runs on most recent laptops	Qwen3.5-4B (4-bit)	~8 GB	~3 GB	`QWEN3_5_4B_MULTIMODAL_Q4_K_M`
Balanced, the plugin’s out-of-box default	Qwen3.5-9B (4-bit)	~12 GB	~6 GB	`QWEN3_5_9B_MULTIMODAL_Q4_K_M`
Strongest coding, used in the demo	Qwen3.6-35B-A3B (4-bit)	~32 GB	~21 GB	`QWEN3_6_35B_A3B_MULTIMODAL_Q4_K_M`

The demo uses Qwen3.6-35B-A3B: 35 billion parameters of knowledge, only about 3 billion active per token, so it stays fast despite its size. For most laptops, Qwen3.5-4B or the 9B default is the better starting point.

A GPU helps but is not required. QVAC uses Apple Metal on Apple Silicon and Vulkan on NVIDIA, AMD, and Intel; on a CPU-only machine it still runs, just slower. To check what your hardware supports before you start, run npx -y @qvac/cli doctor (no install needed).

Set it up

No second terminal and no manual server. OpenCode reads the QVAC plugin from your project config, brings up a managed QVAC server by itself, points itself at it, and shuts it down when you quit. Full steps are in the setup block below.

Requirements. Node.js 22.17 or newer (run node --version, get it from nodejs.org). Disk and RAM depend on the model: the default Qwen3.5-9B needs about 6 GB of disk; the demo’s Qwen3.6-35B-A3B needs about 21 GB of disk and 32 GB of RAM. To check what your hardware supports first, run npx -y @qvac/cli doctor (no install needed).

1. Install OpenCode

bash

npm install -g opencode-ai

2. Add the QVAC plugin to your project

In the folder you want to code in, create an opencode.json. The whole file is one line, using the plugin’s default model (Qwen3.5-9B):

opencode.json

{
  "$schema": "https://opencode.ai/config.json",
  "plugin": ["@qvac/opencode-plugin"]
}

To match the demo with the strongest model instead (heavier: about 21 GB of disk and 32 GB of RAM), name it explicitly:

opencode.json

{
  "$schema": "https://opencode.ai/config.json",
  "plugin": [["@qvac/opencode-plugin", { "model": "QWEN3_6_35B_A3B_MULTIMODAL_Q4_K_M" }]]
}

The model id is a QVAC reference, not a path to a file you supply. QVAC downloads that model once into its own local store, then serves it locally with no network. A .gguf you already have from another tool is not reused.

3. Run OpenCode

bash

opencode web        # browser UI, the most visual
opencode            # terminal UI
opencode run "..."  # one-shot

On first run the plugin installs itself, spawns a managed QVAC server in the background (downloading the model once), wires OpenCode to it as the local qvac provider, and reaps it when you quit. No qvac serve to run, no provider block, no toolsMode. It loads on startup whatever interface you use.

Test that it is really local

Ask the agent whether it is running in the cloud or on your machine. Then turn off your network and ask again. It keeps answering, because nothing was being sent anywhere.

What to try next

From here, the same setup handles the rest of what a coding agent does. Point OpenCode at a project folder and have it read and edit code, generate a small app from a prompt, explain an unfamiliar repository, or track down a bug. The agent loop is identical to the cloud experience. The only thing that changed is that the model behind it is yours, on your hardware, with nothing leaving the machine.

Notes and limits

A model that fits on your machine is not a frontier cloud model. Expect it to need clearer, more focused instructions, and give it single-goal tasks rather than long open-ended ones.
Local responses can take longer than a cloud API today. On a single local worker, a tool-using turn with the 35B model runs in roughly 20 to 30 seconds; a smaller model is snappier. That gap is closing fast as local hardware speeds up and small models improve.
One QVAC worker runs per machine, and multiple OpenCode windows share it. If the OpenCode desktop app is open it can hold locks the terminal needs, so quit it when running opencode from the terminal.
QVAC is Apache 2.0 and free. The model is downloaded once and cached locally. Source and docs: github.com/tetherto/qvac and docs.qvac.tether.io.

Back To Blog