Everything Ollama can't do
Run any GGUF model from HuggingFace with full GPU support. No registry lock-in.
kapri pull unsloth/Qwen3.5-0.8B-GGUF
Full Vulkan, CUDA, ROCm & SYCL support. Any GGUF from HuggingFace. Open source.
pip install kapri-ai
Run any GGUF model from HuggingFace with full GPU support. No registry lock-in.
kapri pull unsloth/Qwen3.5-0.8B-GGUF
$ kapri install
Downloading llama.cpp...
Downloading llama-swap...
Detecting GPU backend...
✓ Vulkan detected (AMD GPU)
$ kapri pull qwen3.5-0.8b
Downloading model...
✓ Model ready
$ kapri serve
✓ Server running at http://localhost:11434
Native support for all major GPU backends. No workarounds, no compromises.
Run entirely offline. No cloud, no telemetry, no training on your data.