The easiest way to run AI locally

Full Vulkan, CUDA, ROCm & SYCL support. Any GGUF from HuggingFace. Open source.

pip install kapri-ai

Everything Ollama can't do

Run any GGUF model from HuggingFace with full GPU support. No registry lock-in.

kapri pull unsloth/Qwen3.5-0.8B-GGUF

See all models →

$ kapri install

Downloading llama.cpp...

Downloading llama-swap...

Detecting GPU backend...

✓ Vulkan detected (AMD GPU)

$ kapri pull qwen3.5-0.8b

Downloading model...

✓ Model ready

$ kapri serve

✓ Server running at http://localhost:11434

Native support for all major GPU backends. No workarounds, no compromises.

Run entirely offline. No cloud, no telemetry, no training on your data.

Download