Private • Local • Offline AI

Build Your
Private AI
Ollama + Open WebUI • Zero Cloud Dependency

A complete, step-by-step guide to running powerful large language models entirely on your own machine. No subscriptions, no data leaving your computer, no compromises on privacy. Your conversations stay yours.

💻
Your Machine
CPU / GPU / RAM
🦙
Ollama
Model Runtime
🌐
Open WebUI
Chat Interface
🔒
100% Private
Nothing Leaves

// 01 — Requirements

Hardware You'll Need

Local LLMs need memory. More RAM and VRAM means bigger, smarter models. Here's what works at each tier.

// 02 — Setup Guide

Install in 10 Minutes

From zero to your own private ChatGPT in four steps. Works on Windows, macOS, and Linux.

1

Install Ollama

Ollama is a lightweight runtime that downloads and runs LLMs locally. It handles model management, quantisation, and GPU acceleration automatically.

Terminal — Install Ollama
# macOS & Linux (one-line install) $ curl -fsSL https://ollama.com/install.sh | sh # Windows — download installer from: # https://ollama.com/download # Verify installation $ ollama --version ollama version 0.16.2 # The Ollama service starts automatically. # It runs on http://localhost:11434
2

Pull Your First Model

Choose a model from the Ollama Library. Start small to test your hardware, then scale up. Models download once and are stored locally.

Terminal — Download Models
# Pull a lightweight model to test (3.8B params) $ ollama pull llama3.2 pulling manifest... done pulling dde5aa3fc5ff... 100% 2.0 GB success # Try it out immediately $ ollama run llama3.2 >>> Hello! How can I help you today? # Pull a coding-focused model $ ollama pull qwen2.5-coder:7b # Pull a reasoning model $ ollama pull deepseek-r1:8b # List all downloaded models $ ollama list
3

Install Docker

Open WebUI runs in a Docker container. If you don't have Docker yet, install Docker Desktop — it takes 2 minutes and gives you a GUI to manage containers.

Terminal — Install Docker
# macOS & Windows: # Download Docker Desktop from https://docker.com # Linux (Ubuntu/Debian) $ sudo apt update && sudo apt install docker.io -y $ sudo systemctl enable --now docker $ sudo usermod -aG docker $USER # Verify Docker is running $ docker --version Docker version 27.4.0
4

Launch Open WebUI

One Docker command gives you a polished, ChatGPT-style interface that connects to Ollama. Your data is stored in a persistent volume — nothing is lost between restarts.

Terminal — Run Open WebUI
# Run Open WebUI (connects to local Ollama) $ docker run -d -p 3000:8080 \ --add-host=host.docker.internal:host-gateway \ -v open-webui:/app/backend/data \ --name open-webui --restart always \ ghcr.io/open-webui/open-webui:main # OR — all-in-one (Ollama + WebUI bundled): $ docker run -d -p 3000:8080 \ -v ollama:/root/.ollama \ -v open-webui:/app/backend/data \ --name open-webui --restart always \ ghcr.io/open-webui/open-webui:ollama # Open your browser: http://localhost:3000 # Create your admin account on first visit. # Select a model and start chatting!

// 03 — Choose Your Model

Recommended Models

Pick the right model for your hardware and use case. Smaller models are faster; larger ones are smarter.

// 04 — What You Can Do

Private AI Use Cases

Once running, your local AI becomes a Swiss Army knife for productivity — with zero data leaving your machine.

// 05 — Cloud Providers via Open WebUI

Connect Cloud AI Models

Open WebUI isn't limited to local models. You can connect Claude, GPT-4, Gemini, and more — all through a single unified interface you control.

C

Claude (Anthropic)

Use Claude 4.5 Sonnet, Claude Opus, or Haiku through Open WebUI by adding Anthropic as an OpenAI-compatible connection. Your prompts route through their API but your conversation history stays local.

Open WebUI — Add Anthropic
# In Open WebUI: Settings → Connections → Add Connection # Name: Anthropic Claude # API Base URL: https://api.anthropic.com/v1 # API Key: (your Anthropic API key from console.anthropic.com) sk-ant-api03-xxxxxxxxxxxxx # Available models will auto-populate: claude-sonnet-4-5-20250929 claude-haiku-4-5-20251001 claude-opus-4-6 # Alternative: Use the Anthropic API Pipe # Install from Open WebUI Community → Functions → "Anthropic Pipe"
O

ChatGPT / OpenAI

Connect GPT-4o, GPT-4 Turbo, o1, and other OpenAI models directly. Open WebUI natively supports the OpenAI API format — it's the simplest integration.

Open WebUI — Add OpenAI
# In Open WebUI: Settings → Connections → OpenAI API # API Base URL: https://api.openai.com/v1 # API Key: (from platform.openai.com) sk-proj-xxxxxxxxxxxxx # Models available: gpt-4o gpt-4-turbo o1-preview gpt-4o-mini # Or via Docker environment variable: $ docker run -d -p 3000:8080 \ -e OPENAI_API_KEY=sk-proj-xxx \ -v open-webui:/app/backend/data \ --name open-webui --restart always \ ghcr.io/open-webui/open-webui:main
G

Google Gemini

Access Gemini 2.0 Flash, Gemini Pro, and Ultra models through their OpenAI-compatible endpoint. Google offers a generous free tier through AI Studio.

Open WebUI — Add Gemini
# Get a free API key from: aistudio.google.com # In Open WebUI: Settings → Connections → Add Connection # API Base URL (OpenAI-compatible): https://generativelanguage.googleapis.com/v1beta/openai # API Key: AIzaSyXXXXXXXXXXXXXXX # Models: gemini-2.0-flash gemini-1.5-pro gemini-1.5-flash
D

DeepSeek (R1, V3, Coder)

DeepSeek offers cutting-edge reasoning and coding models at a fraction of the cost of GPT-4. Their R1 model rivals top reasoning engines. The API follows the OpenAI format natively.

Open WebUI — Add DeepSeek
# Get API key from: platform.deepseek.com # API Base URL: https://api.deepseek.com/v1 # API Key: sk-xxxxxxxxxxxxxxxx # Models: deepseek-reasoner # R1 — chain-of-thought reasoning deepseek-chat # V3 — general conversation # Pricing: ~$0.14/1M input tokens (extremely cheap) # Or run DeepSeek locally via Ollama: $ ollama pull deepseek-r1:8b
M

Mistral AI

Run Mistral Large, Mistral Medium, and Codestral through their API. Mistral also offers fine-tuned models for specific tasks.

Open WebUI — Add Mistral
# Get API key from: console.mistral.ai # API Base URL: https://api.mistral.ai/v1 # Models: mistral-large-latest mistral-medium-latest codestral-latest mistral-small-latest
G

Groq (Ultra-Fast Inference)

Groq provides lightning-fast inference for open-source models using custom LPU chips. Free tier available. Connect for near-instant responses from Llama, Mixtral, and Gemma.

Open WebUI — Add Groq
# Get free API key from: console.groq.com # API Base URL: https://api.groq.com/openai/v1 # Models (blazing fast): llama-3.3-70b-versatile llama-3.1-8b-instant mixtral-8x7b-32768 gemma2-9b-it
X

xAI Grok

Elon Musk's xAI offers Grok models with real-time knowledge and unfiltered responses. The API uses the standard OpenAI-compatible format.

Open WebUI — Add xAI
# Get API key from: console.x.ai # API Base URL: https://api.x.ai/v1 # Models: grok-2 grok-2-mini
P

Perplexity (Online Search AI)

Perplexity models combine LLM intelligence with real-time web search. Perfect for research tasks where you need current information with cited sources.

Open WebUI — Add Perplexity
# Get API key from: perplexity.ai/settings/api # API Base URL: https://api.perplexity.ai # Models: sonar-pro # Best for research sonar # Fast search sonar-reasoning-pro # Deep research + reasoning
🔄
Hybrid Setup — Best of Both Worlds

Run a fast local model (Llama 3.2 or Mistral 7B) for everyday tasks and quick questions. Switch to a cloud model (Claude Opus, GPT-4o) for complex reasoning, long documents, or coding tasks that need maximum intelligence. Open WebUI lets you switch between models with a single dropdown — all your conversations stay in one place.

// 06 — Best Practices

Pro Tips

Get the most out of your local LLM setup with these expert recommendations.

// 07 — Troubleshooting

Common Issues

Quick fixes for the most frequent setup problems.