Articles by
Mikhail Sotnikov

Filter By:
Dynamic Tooling & KV Cache Management: Smaller Toolboxes, Faster Local LLMs

Your local LLM now receives a tailored toolbox for every interaction, with automatic KV cache compaction to maintain high-speed inference. Agentic applications tend to grow tool catalogs quickly. A personal assistant might have weather, calendar, file search, notes, reminders, device actions, workspace search, and app-specific commands. But any single user turn usually needs only a […]

Read more
Loading...