Fabric LLM

Our edge-first high-performance AI framework transforms any consumer device into a capable inference and fine-tuning node. No central clouds, no massive data centers, no vendor dependency.

From Android and Apple smartphones to high-end workstations and even industry-grade mainframes, our unified system allows LoRA fine-tuning directly in the llama.cpp ecosystem so you can initialize, train, checkpoint and merge adapters locally for maximum privacy and resilience.

GitHub

Fabric LLM

          
        

// For Apple Siliconcurl -L https://github.com/tetherto/qvac-fabric/releases/download/
v1.0/qvac-macos-apple-silicon-v1.0.zip -o qvac-macos.zip
unzip qvac-macos.zip
cd qvac-macos-apple-silicon-v1.0

# Download modelmkdir -p models
wget https://huggingface.co/Qwen/Qwen3-1.7B-GGUF/resolve/main/
qwen3-1_7b-q8_0.gguf -O models/qwen3-1.7b-q8_0.gguf

# Download datasetwget https://raw.githubusercontent.com/tetherto/qvac-fabric/main/
datasets/train.jsonl

# Quick test with email style transfer
./bin/llama-finetune-lora -m models/qwen3-1.7b-q8_0.gguf -f 
train.jsonl -c 512 -b 128 -ub 128 -ngl 999 --lora-rank 16 --lora-
alpha 32 --num-epochs 3

    

Cross-platform scalability

Our solution provides universal compatibility across the entire desktop GPU ecosystem, including AMD, Intel, NVIDIA, and Apple architectures. By leveraging Vulkan, we ensure your sensitive datasets never leave your control while maintaining total operational resilience.

GitHub

Train anywhere

Whether it's Adreno, Mali, or Apple, our novel dynamic tiling algorithm lets you train wherever you are. Fabric is the first to offer this previously unsupported capability.

GitHub

Only assistant responses

We implemented masked-loss training, where a mask is applied to train only on assistant tokens. This ensures that user and system messages influence the context but not the loss and that the same tokenization and masking logic are used consistently during both dataset creation and loss computation.

GitHub

FAQ

1. What hardware can I use for fine-tuning?

Unlike legacy frameworks that require CUDA, our solution supports virtually all modern consumer hardware. By leveraging Vulkan and Metal backends, you can train on Android (Qualcomm Adreno, ARM Mali), iOS and macOS (Apple Silicon), and standard Windows/Linux setups (AMD, Intel, NVIDIA).

2. How is training possible on mobile GPUs without crashing?

3. How does this differ from standard llama.cpp?

4. What is instruction fine-tuning?

5. What is masked-loss training?

6. Does my data leave the device?

7. How does the model quality compare to PyTorch?