10 June 2026 — QVAC MedPsy is a free, open-source medical AI that runs entirely on your phone, with no cloud and nothing leaving your device. Its 4B model matches Google's MedGemma-27B at nearly seven times smaller. Here is how small, private medical AI works, and what it can and cannot do.
QVAC Blog
2 June 2026 — TurboQuant is a KV-cache quantization algorithm published by Google Research at ICLR 2026 (Zandieh et al.). It compresses the running context memory of a transformer LLM by up to 5x with nearly no accuracy loss across long-context benchmarks. QVAC SDK 0.12.0 integrates TurboQuant inside qvac-fabric-llm.cpp with a Vulkan backend.
26 May 2026 — Your local LLM now receives a tailored toolbox for every interaction, with automatic KV cache compaction to maintain high-speed inference. Agentic applications tend to grow tool catalogs quickly. A personal assistant might have weather, calendar, file search, notes, reminders, device actions, workspace search, and app-specific commands. But any single user turn usually needs only a […]
12 May 2026 — Translation has been one of the corner use-cases of NLP for a long time. This makes it sound easy but we've found out that... not so much.