QVAC Blog

Filter By:
QVAC Football Predictor 2026: a local AI that calls matches, and shows its work

We built an app that predicts football matches. The twist: the AI runs entirely on your own machine, it simulates each match 10,000 times, and it reasons out loud before committing to a scoreline. No cloud model, no API key, no per-token bill. You can watch it think. This is a write-up of how it […]

Read more
TurboQuant in QVAC SDK 0.12.0: KV-cache quantization for production local AI

TurboQuant is a KV-cache quantization algorithm published by Google Research at ICLR 2026 (Zandieh et al.). It compresses the running context memory of a transformer LLM by up to 5x with nearly no accuracy loss across long-context benchmarks. QVAC SDK 0.12.0 integrates TurboQuant inside qvac-fabric-llm.cpp with a Vulkan backend.

Read more
Dynamic Tooling & KV Cache Management: Smaller Toolboxes, Faster Local LLMs

Your local LLM now receives a tailored toolbox for every interaction, with automatic KV cache compaction to maintain high-speed inference. Agentic applications tend to grow tool catalogs quickly. A personal assistant might have weather, calendar, file search, notes, reminders, device actions, workspace search, and app-specific commands. But any single user turn usually needs only a […]

Read more
One SDK for all of your AI

The QVAC SDK: a single JavaScript interface for LLM completion and other capabilities. See how it works under the hood.

Read more
Loading...