Installation
# Docker (recommended — easiest setup)
docker pull tabbyml/tabby
# CPU-only run
docker run -it \
-v $HOME/.tabby:/data \
-p 8080:8080 \
tabbyml/tabby \
serve --model TabbyML/StarCoder-1B
# NVIDIA GPU (CUDA)
docker run -it --gpus all \
-v $HOME/.tabby:/data \
-p 8080:8080 \
tabbyml/tabby \
serve --model TabbyML/DeepseekCoder-6.7B --device cuda
# Apple Silicon (Metal)
docker run -it \
-v $HOME/.tabby:/data \
-p 8080:8080 \
tabbyml/tabby \
serve --model TabbyML/StarCoder-1B --device metal
# Via cargo (Rust toolchain required)
cargo install tabby --features cuda # CUDA build
cargo install tabby # CPU build
# Verify installation
curl http://localhost:8080/health
Configuration
# Default config file location
# Linux/Mac: ~/.tabby/config.toml
# Docker volume: mounted at /data (maps to ~/.tabby on host)
# View running config
curl http://localhost:8080/v1/health | jq .
# ~/.tabby/config.toml
[model.completion.http]
kind = "openai/chat"
model_name = "gpt-4o"
api_endpoint = "https://api.openai.com/v1"
api_key = "sk-your-key-here"
[model.chat.http]
kind = "openai/chat"
model_name = "gpt-4o"
api_endpoint = "https://api.openai.com/v1"
api_key = "sk-your-key-here"
# Repository context configuration
[[repositories]]
name = "my-project"
git_url = "file:///home/user/projects/my-project"
[[repositories]]
name = "shared-lib"
git_url = "https://github.com/org/shared-lib.git"
# docker-compose.yml — recommended for production
version: "3"
services:
tabby:
image: tabbyml/tabby
command: serve --model TabbyML/DeepseekCoder-6.7B --chat-model TabbyML/Mistral-7B --device cuda
volumes:
- tabby_data:/data
ports:
- "8080:8080"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
environment:
- TABBY_DISABLE_USAGE_COLLECTION=1
volumes:
tabby_data:
Core Commands / API
| Command | Description |
|---|
tabby serve | Start the Tabby server |
tabby serve --model MODEL | Serve with a specific completion model |
tabby serve --chat-model MODEL | Add a chat/answer engine model |
tabby serve --device cuda | Use NVIDIA GPU |
tabby serve --device metal | Use Apple Silicon GPU |
tabby serve --device cpu | Force CPU inference |
tabby serve --port 8080 | Set port (default: 8080) |
tabby download --model MODEL | Pre-download a model |
tabby scheduler --now | Trigger repository indexing now |
tabby --help | Show all commands |
| REST API Endpoint | Method | Description |
|---|
/v1/health | GET | Server health and version |
/v1/completions | POST | Code completion request |
/v1/chat/completions | POST | Chat/answer engine request |
/v1/search | GET | Search indexed code |
/v1/events | POST | Log user activity events |
/v1beta/server_setting | GET | Get server settings |
/v1beta/repositories | GET | List indexed repositories |
Advanced Usage
Model Selection
# Popular completion models (choose by GPU VRAM):
# < 4GB VRAM or CPU only
tabby serve --model TabbyML/StarCoder-1B
# 4-8GB VRAM
tabby serve --model TabbyML/DeepseekCoder-6.7B
tabby serve --model TabbyML/CodeLlama-7B
# 8-16GB VRAM
tabby serve --model TabbyML/CodeLlama-13B
tabby serve --model TabbyML/StarCoder2-15B
# Chat models (for answer engine)
tabby serve \
--model TabbyML/DeepseekCoder-6.7B \
--chat-model TabbyML/Mistral-7B
# Using cloud providers as backend (no local GPU needed)
# Configure in config.toml as shown in Configuration section
# Pre-download models for air-gapped deployment
tabby download --model TabbyML/DeepseekCoder-6.7B
tabby download --model TabbyML/Mistral-7B
# Models stored in: ~/.tabby/models/
ls ~/.tabby/models/
VS Code Extension Setup
# Install from VS Code marketplace:
# Search: "Tabby" by TabbyML
# Extension ID: TabbyML.vscode-tabby
# Or install via CLI
code --install-extension TabbyML.vscode-tabby
// VS Code settings.json
{
"tabby.endpoint": "http://localhost:8080",
"tabby.inlineCompletion.trigger": "auto", // "auto" | "manual"
"tabby.keybindings": "default",
"tabby.usage.anonymousUsageTracking": false
}
# Key bindings in VS Code:
# Tab — Accept completion
# Escape — Dismiss completion
# Alt+] — Next completion suggestion
# Alt+[ — Previous completion suggestion
# Ctrl+Space — Manually trigger completion (if set to manual)
JetBrains IDE Extension
# Install from JetBrains Marketplace:
# Settings → Plugins → Marketplace → Search "Tabby"
# Supports: IntelliJ, PyCharm, GoLand, WebStorm, CLion, etc.
# JetBrains settings:
# Settings → Tools → Tabby
# Server URL: http://localhost:8080
# Completion Trigger: Auto / Manual
# Token: (from Admin UI → Users → Generate Token)
Vim/Neovim Plugin
" Install with vim-plug
Plug 'TabbyML/vim-tabby'
" Or with lazy.nvim
{
"TabbyML/vim-tabby",
init = function()
vim.g.tabby_agent_start_command = {"npx", "tabby-agent", "--stdio"}
vim.g.tabby_inline_completion_trigger = "auto"
end,
}
# Requires tabby-agent (Node.js)
npm install -g tabby-agent
# Configure server endpoint
# ~/.tabby-client/agent/config.toml
[server]
endpoint = "http://localhost:8080"
token = "your-token-here"
Repository Context (RAG)
# ~/.tabby/config.toml — index local and remote repos
[[repositories]]
name = "frontend"
git_url = "file:///home/user/projects/frontend"
[[repositories]]
name = "backend"
git_url = "file:///home/user/projects/backend"
[[repositories]]
name = "company-lib"
git_url = "https://github.com/company/shared-lib.git"
# Trigger indexing manually
tabby scheduler --now
# Indexing runs automatically every 5 minutes by default
# Check indexing status via admin UI: http://localhost:8080
# Search indexed code via API
curl "http://localhost:8080/v1/search?q=authentication+middleware&limit=5" \
-H "Authorization: Bearer your-token"
Team Deployment with Authentication
# Enable authentication in admin UI
# http://localhost:8080 → Settings → Security
# Generate user tokens
# Admin UI → Users → New User → Generate Token
# Use token in VS Code settings
# tabby.endpoint: http://your-server:8080
# tabby.serverToken: "your-user-token"
# Environment variable for self-service registration
TABBY_WEBSERVER_JWT_TOKEN_SECRET=your-secret-32chars
# Multi-user docker-compose with Nginx reverse proxy
version: "3"
services:
tabby:
image: tabbyml/tabby
command: serve --model TabbyML/DeepseekCoder-6.7B --device cuda
volumes:
- tabby_data:/data
ports:
- "8080:8080"
nginx:
image: nginx:alpine
ports:
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./certs:/etc/ssl/certs
REST API Usage
# Code completion request
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-token" \
-d '{
"language": "python",
"segments": {
"prefix": "def fibonacci(n: int) -> int:\n \"\"\"Return nth Fibonacci number.\"\"\"\n ",
"suffix": "\n\nresult = fibonacci(10)"
},
"temperature": 0.1,
"seed": 42
}'
# Chat completion
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-token" \
-d '{
"messages": [
{"role": "user", "content": "Explain how async/await works in Python"}
]
}'
# Python client for Tabby API
import httpx
TABBY_URL = "http://localhost:8080"
TOKEN = "your-token"
def complete_code(prefix: str, suffix: str = "", language: str = "python") -> str:
response = httpx.post(
f"{TABBY_URL}/v1/completions",
headers={"Authorization": f"Bearer {TOKEN}"},
json={
"language": language,
"segments": {
"prefix": prefix,
"suffix": suffix,
},
},
timeout=30,
)
response.raise_for_status()
data = response.json()
return data["choices"][0]["text"]
# Example
completion = complete_code("def reverse_string(s: str) -> str:\n ")
print(completion)
Common Workflows
Local Dev Setup (no GPU)
# Start Tabby with small CPU model
docker run -d \
--name tabby \
-v $HOME/.tabby:/data \
-p 8080:8080 \
--restart unless-stopped \
tabbyml/tabby \
serve --model TabbyML/StarCoder-1B
# Check logs
docker logs -f tabby
# Install VS Code extension and point to localhost:8080
code --install-extension TabbyML.vscode-tabby
# Test endpoint
curl http://localhost:8080/v1/health
Team Server Setup (CUDA)
# 1. Provision GPU server (e.g., AWS g4dn.xlarge = T4 16GB)
# 2. Install Docker + NVIDIA container toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
# ... (full nvidia-docker install steps)
# 3. Start Tabby with GPU
docker run -d \
--name tabby \
--gpus all \
-v /opt/tabby:/data \
-p 8080:8080 \
--restart unless-stopped \
tabbyml/tabby \
serve \
--model TabbyML/DeepseekCoder-6.7B \
--chat-model TabbyML/Mistral-7B \
--device cuda
# 4. Configure team members to use http://your-server-ip:8080
Air-Gapped / Private Deployment
# Pre-download all needed models on internet-connected machine
tabby download --model TabbyML/DeepseekCoder-6.7B
tabby download --model TabbyML/Mistral-7B
# Copy model cache to air-gapped machine
rsync -av ~/.tabby/models/ airgapped-host:/opt/tabby/models/
# Run on air-gapped machine (models already cached)
docker run -d \
-v /opt/tabby:/data \
-p 8080:8080 \
tabbyml/tabby \
serve --model TabbyML/DeepseekCoder-6.7B --device cuda
Tips and Best Practices
Model Selection Guide
| VRAM | Recommended Completion Model | Notes |
|---|
| CPU only | TabbyML/StarCoder-1B | Slow but works everywhere |
| 4GB | TabbyML/StarCoder-1B | Fast on GPU |
| 8GB | TabbyML/DeepseekCoder-6.7B | Best quality/performance balance |
| 16GB | TabbyML/StarCoder2-15B | High quality completions |
| 24GB+ | TabbyML/CodeLlama-13B or larger | Production-grade |
# Reduce memory usage — use smaller quantized models
# Model names ending in -Q4 or -GGUF use less VRAM
# Monitor GPU usage while Tabby is running
watch -n1 nvidia-smi
# Set completion timeout in VS Code settings
{
"tabby.inlineCompletion.debounceDelay": 300 # ms to wait before triggering
}
# For faster response: use smaller model + GPU over larger model + CPU
# StarCoder-1B on GPU > DeepseekCoder-6.7B on CPU
Security Best Practices
| Practice | Detail |
|---|
| Enable auth | Turn on authentication in admin UI for team deployments |
| Use HTTPS | Put Nginx/Caddy in front with TLS for remote access |
| Token rotation | Rotate user tokens periodically via admin UI |
| Firewall | Restrict port 8080 to internal network only |
| Usage tracking | Set TABBY_DISABLE_USAGE_COLLECTION=1 for privacy |
| Air-gap option | Deploy fully offline — no external calls needed |
Troubleshooting
# Server won't start — check logs
docker logs tabby
# Out of memory — switch to smaller model
docker run ... tabbyml/tabby serve --model TabbyML/StarCoder-1B
# GPU not detected
nvidia-smi # verify GPU is visible
docker run --gpus all nvidia/cuda:12.0.0-base-ubuntu20.04 nvidia-smi # test nvidia-docker
# Slow completions on CPU
# Normal: CPU inference is slow (1-5s for 1B model)
# Fix: add a GPU, or use cloud backend in config.toml
# VS Code extension not connecting
# Check: tabby.endpoint matches your server URL
# Check: server is running (curl http://localhost:8080/v1/health)
# Check: token is correct (if auth is enabled)
# Re-index repositories
tabby scheduler --now
# Or via API: POST http://localhost:8080/v1beta/repositories/resolve