A complete, step-by-step guide to running powerful large language models entirely on your own machine. No subscriptions, no data leaving your computer, no compromises on privacy. Your conversations stay yours.
// 01 — Requirements
Local LLMs need memory. More RAM and VRAM means bigger, smarter models. Here's what works at each tier.
// 02 — Setup Guide
From zero to your own private ChatGPT in four steps. Works on Windows, macOS, and Linux.
Ollama is a lightweight runtime that downloads and runs LLMs locally. It handles model management, quantisation, and GPU acceleration automatically.
Choose a model from the Ollama Library. Start small to test your hardware, then scale up. Models download once and are stored locally.
Open WebUI runs in a Docker container. If you don't have Docker yet, install Docker Desktop — it takes 2 minutes and gives you a GUI to manage containers.
One Docker command gives you a polished, ChatGPT-style interface that connects to Ollama. Your data is stored in a persistent volume — nothing is lost between restarts.
// 03 — Choose Your Model
Pick the right model for your hardware and use case. Smaller models are faster; larger ones are smarter.
// 04 — What You Can Do
Once running, your local AI becomes a Swiss Army knife for productivity — with zero data leaving your machine.
// 05 — Cloud Providers via Open WebUI
Open WebUI isn't limited to local models. You can connect Claude, GPT-4, Gemini, and more — all through a single unified interface you control.
Use Claude 4.5 Sonnet, Claude Opus, or Haiku through Open WebUI by adding Anthropic as an OpenAI-compatible connection. Your prompts route through their API but your conversation history stays local.
Connect GPT-4o, GPT-4 Turbo, o1, and other OpenAI models directly. Open WebUI natively supports the OpenAI API format — it's the simplest integration.
Access Gemini 2.0 Flash, Gemini Pro, and Ultra models through their OpenAI-compatible endpoint. Google offers a generous free tier through AI Studio.
DeepSeek offers cutting-edge reasoning and coding models at a fraction of the cost of GPT-4. Their R1 model rivals top reasoning engines. The API follows the OpenAI format natively.
Run Mistral Large, Mistral Medium, and Codestral through their API. Mistral also offers fine-tuned models for specific tasks.
Groq provides lightning-fast inference for open-source models using custom LPU chips. Free tier available. Connect for near-instant responses from Llama, Mixtral, and Gemma.
Elon Musk's xAI offers Grok models with real-time knowledge and unfiltered responses. The API uses the standard OpenAI-compatible format.
Perplexity models combine LLM intelligence with real-time web search. Perfect for research tasks where you need current information with cited sources.
Run a fast local model (Llama 3.2 or Mistral 7B) for everyday tasks and quick questions. Switch to a cloud model (Claude Opus, GPT-4o) for complex reasoning, long documents, or coding tasks that need maximum intelligence. Open WebUI lets you switch between models with a single dropdown — all your conversations stay in one place.
// 06 — Best Practices
Get the most out of your local LLM setup with these expert recommendations.
// 07 — Troubleshooting
Quick fixes for the most frequent setup problems.