The easiest way to run AI locally

Full Vulkan, CUDA, ROCm & SYCL support. Any GGUF from HuggingFace. Open source.

pip install kapri-ai

or view on GitHub

Get started →

Everything Ollama can't do

Run any GGUF model from HuggingFace with full GPU support. No registry lock-in.

kapri pull unsloth/Qwen3.5-0.8B-GGUF
See all models →

$ kapri install

Downloading llama.cpp...

Downloading llama-swap...

Detecting GPU backend...

✓ Vulkan detected (AMD GPU)

$ kapri pull qwen3.5-0.8b

Downloading model...

✓ Model ready

$ kapri serve

✓ Server running at http://localhost:11434

Full GPU support

Native support for all major GPU backends. No workarounds, no compromises.

  • Vulkan — AMD GPUs at full speed
  • CUDA — NVIDIA GPUs
  • ROCm — AMD Linux
  • SYCL — Intel GPUs
  • Metal — Apple Silicon

Your data stays yours

Run entirely offline. No cloud, no telemetry, no training on your data.

  • 100% local inference
  • No telemetry or analytics
  • Works entirely offline
  • Open source (MIT)

Get started with Kapri

Download