Tabby

Installation

# Docker (recommended — easiest setup)
docker pull tabbyml/tabby

# CPU-only run
docker run -it \
  -v $HOME/.tabby:/data \
  -p 8080:8080 \
  tabbyml/tabby \
  serve --model TabbyML/StarCoder-1B

# NVIDIA GPU (CUDA)
docker run -it --gpus all \
  -v $HOME/.tabby:/data \
  -p 8080:8080 \
  tabbyml/tabby \
  serve --model TabbyML/DeepseekCoder-6.7B --device cuda

# Apple Silicon (Metal)
docker run -it \
  -v $HOME/.tabby:/data \
  -p 8080:8080 \
  tabbyml/tabby \
  serve --model TabbyML/StarCoder-1B --device metal

# Via cargo (Rust toolchain required)
cargo install tabby --features cuda          # CUDA build
cargo install tabby                          # CPU build

# Verify installation
curl http://localhost:8080/health

Configuration

# Default config file location
# Linux/Mac: ~/.tabby/config.toml
# Docker volume: mounted at /data (maps to ~/.tabby on host)

# View running config
curl http://localhost:8080/v1/health | jq .

# ~/.tabby/config.toml

[model.completion.http]
kind = "openai/chat"
model_name = "gpt-4o"
api_endpoint = "https://api.openai.com/v1"
api_key = "sk-your-key-here"

[model.chat.http]
kind = "openai/chat"
model_name = "gpt-4o"
api_endpoint = "https://api.openai.com/v1"
api_key = "sk-your-key-here"

# Repository context configuration
[[repositories]]
name = "my-project"
git_url = "file:///home/user/projects/my-project"

[[repositories]]
name = "shared-lib"
git_url = "https://github.com/org/shared-lib.git"

# docker-compose.yml — recommended for production
version: "3"
services:
  tabby:
    image: tabbyml/tabby
    command: serve --model TabbyML/DeepseekCoder-6.7B --chat-model TabbyML/Mistral-7B --device cuda
    volumes:
      - tabby_data:/data
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    environment:
      - TABBY_DISABLE_USAGE_COLLECTION=1

volumes:
  tabby_data:

Core Commands / API

Command	Description
`tabby serve`	Start the Tabby server
`tabby serve --model MODEL`	Serve with a specific completion model
`tabby serve --chat-model MODEL`	Add a chat/answer engine model
`tabby serve --device cuda`	Use NVIDIA GPU
`tabby serve --device metal`	Use Apple Silicon GPU
`tabby serve --device cpu`	Force CPU inference
`tabby serve --port 8080`	Set port (default: 8080)
`tabby download --model MODEL`	Pre-download a model
`tabby scheduler --now`	Trigger repository indexing now
`tabby --help`	Show all commands

REST API Endpoint	Method	Description
`/v1/health`	GET	Server health and version
`/v1/completions`	POST	Code completion request
`/v1/chat/completions`	POST	Chat/answer engine request
`/v1/search`	GET	Search indexed code
`/v1/events`	POST	Log user activity events
`/v1beta/server_setting`	GET	Get server settings
`/v1beta/repositories`	GET	List indexed repositories

Advanced Usage

Model Selection

# Popular completion models (choose by GPU VRAM):

# < 4GB VRAM or CPU only
tabby serve --model TabbyML/StarCoder-1B

# 4-8GB VRAM
tabby serve --model TabbyML/DeepseekCoder-6.7B
tabby serve --model TabbyML/CodeLlama-7B

# 8-16GB VRAM
tabby serve --model TabbyML/CodeLlama-13B
tabby serve --model TabbyML/StarCoder2-15B

# Chat models (for answer engine)
tabby serve \
  --model TabbyML/DeepseekCoder-6.7B \
  --chat-model TabbyML/Mistral-7B

# Using cloud providers as backend (no local GPU needed)
# Configure in config.toml as shown in Configuration section

# Pre-download models for air-gapped deployment
tabby download --model TabbyML/DeepseekCoder-6.7B
tabby download --model TabbyML/Mistral-7B

# Models stored in: ~/.tabby/models/
ls ~/.tabby/models/

VS Code Extension Setup

# Install from VS Code marketplace:
# Search: "Tabby" by TabbyML
# Extension ID: TabbyML.vscode-tabby

# Or install via CLI
code --install-extension TabbyML.vscode-tabby

// VS Code settings.json
{
  "tabby.endpoint": "http://localhost:8080",
  "tabby.inlineCompletion.trigger": "auto",   // "auto" | "manual"
  "tabby.keybindings": "default",
  "tabby.usage.anonymousUsageTracking": false
}

# Key bindings in VS Code:
# Tab           — Accept completion
# Escape        — Dismiss completion
# Alt+]         — Next completion suggestion
# Alt+[         — Previous completion suggestion
# Ctrl+Space    — Manually trigger completion (if set to manual)

JetBrains IDE Extension

# Install from JetBrains Marketplace:
# Settings → Plugins → Marketplace → Search "Tabby"
# Supports: IntelliJ, PyCharm, GoLand, WebStorm, CLion, etc.

# JetBrains settings:
# Settings → Tools → Tabby
# Server URL: http://localhost:8080
# Completion Trigger: Auto / Manual
# Token: (from Admin UI → Users → Generate Token)

Vim/Neovim Plugin

" Install with vim-plug
Plug 'TabbyML/vim-tabby'

" Or with lazy.nvim
{
  "TabbyML/vim-tabby",
  init = function()
    vim.g.tabby_agent_start_command = {"npx", "tabby-agent", "--stdio"}
    vim.g.tabby_inline_completion_trigger = "auto"
  end,
}

# Requires tabby-agent (Node.js)
npm install -g tabby-agent

# Configure server endpoint
# ~/.tabby-client/agent/config.toml
[server]
endpoint = "http://localhost:8080"
token = "your-token-here"

Repository Context (RAG)

# ~/.tabby/config.toml — index local and remote repos
[[repositories]]
name = "frontend"
git_url = "file:///home/user/projects/frontend"

[[repositories]]
name = "backend"
git_url = "file:///home/user/projects/backend"

[[repositories]]
name = "company-lib"
git_url = "https://github.com/company/shared-lib.git"

# Trigger indexing manually
tabby scheduler --now

# Indexing runs automatically every 5 minutes by default
# Check indexing status via admin UI: http://localhost:8080

# Search indexed code via API
curl "http://localhost:8080/v1/search?q=authentication+middleware&limit=5" \
  -H "Authorization: Bearer your-token"

Team Deployment with Authentication

# Enable authentication in admin UI
# http://localhost:8080 → Settings → Security

# Generate user tokens
# Admin UI → Users → New User → Generate Token

# Use token in VS Code settings
# tabby.endpoint: http://your-server:8080
# tabby.serverToken: "your-user-token"

# Environment variable for self-service registration
TABBY_WEBSERVER_JWT_TOKEN_SECRET=your-secret-32chars

# Multi-user docker-compose with Nginx reverse proxy
version: "3"
services:
  tabby:
    image: tabbyml/tabby
    command: serve --model TabbyML/DeepseekCoder-6.7B --device cuda
    volumes:
      - tabby_data:/data
    ports:
      - "8080:8080"
  nginx:
    image: nginx:alpine
    ports:
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./certs:/etc/ssl/certs

REST API Usage

# Code completion request
curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-token" \
  -d '{
    "language": "python",
    "segments": {
      "prefix": "def fibonacci(n: int) -> int:\n    \"\"\"Return nth Fibonacci number.\"\"\"\n    ",
      "suffix": "\n\nresult = fibonacci(10)"
    },
    "temperature": 0.1,
    "seed": 42
  }'

# Chat completion
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-token" \
  -d '{
    "messages": [
      {"role": "user", "content": "Explain how async/await works in Python"}
    ]
  }'

# Python client for Tabby API
import httpx

TABBY_URL = "http://localhost:8080"
TOKEN = "your-token"

def complete_code(prefix: str, suffix: str = "", language: str = "python") -> str:
    response = httpx.post(
        f"{TABBY_URL}/v1/completions",
        headers={"Authorization": f"Bearer {TOKEN}"},
        json={
            "language": language,
            "segments": {
                "prefix": prefix,
                "suffix": suffix,
            },
        },
        timeout=30,
    )
    response.raise_for_status()
    data = response.json()
    return data["choices"][0]["text"]

# Example
completion = complete_code("def reverse_string(s: str) -> str:\n    ")
print(completion)

Common Workflows

Local Dev Setup (no GPU)

# Start Tabby with small CPU model
docker run -d \
  --name tabby \
  -v $HOME/.tabby:/data \
  -p 8080:8080 \
  --restart unless-stopped \
  tabbyml/tabby \
  serve --model TabbyML/StarCoder-1B

# Check logs
docker logs -f tabby

# Install VS Code extension and point to localhost:8080
code --install-extension TabbyML.vscode-tabby

# Test endpoint
curl http://localhost:8080/v1/health

Team Server Setup (CUDA)

# 1. Provision GPU server (e.g., AWS g4dn.xlarge = T4 16GB)
# 2. Install Docker + NVIDIA container toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
# ... (full nvidia-docker install steps)

# 3. Start Tabby with GPU
docker run -d \
  --name tabby \
  --gpus all \
  -v /opt/tabby:/data \
  -p 8080:8080 \
  --restart unless-stopped \
  tabbyml/tabby \
  serve \
    --model TabbyML/DeepseekCoder-6.7B \
    --chat-model TabbyML/Mistral-7B \
    --device cuda

# 4. Configure team members to use http://your-server-ip:8080

Air-Gapped / Private Deployment

# Pre-download all needed models on internet-connected machine
tabby download --model TabbyML/DeepseekCoder-6.7B
tabby download --model TabbyML/Mistral-7B

# Copy model cache to air-gapped machine
rsync -av ~/.tabby/models/ airgapped-host:/opt/tabby/models/

# Run on air-gapped machine (models already cached)
docker run -d \
  -v /opt/tabby:/data \
  -p 8080:8080 \
  tabbyml/tabby \
  serve --model TabbyML/DeepseekCoder-6.7B --device cuda

Tips and Best Practices

Model Selection Guide

VRAM	Recommended Completion Model	Notes
CPU only	`TabbyML/StarCoder-1B`	Slow but works everywhere
4GB	`TabbyML/StarCoder-1B`	Fast on GPU
8GB	`TabbyML/DeepseekCoder-6.7B`	Best quality/performance balance
16GB	`TabbyML/StarCoder2-15B`	High quality completions
24GB+	`TabbyML/CodeLlama-13B` or larger	Production-grade

Performance Tuning

# Reduce memory usage — use smaller quantized models
# Model names ending in -Q4 or -GGUF use less VRAM

# Monitor GPU usage while Tabby is running
watch -n1 nvidia-smi

# Set completion timeout in VS Code settings
{
  "tabby.inlineCompletion.debounceDelay": 300    # ms to wait before triggering
}

# For faster response: use smaller model + GPU over larger model + CPU
# StarCoder-1B on GPU > DeepseekCoder-6.7B on CPU

Security Best Practices

Practice	Detail
Enable auth	Turn on authentication in admin UI for team deployments
Use HTTPS	Put Nginx/Caddy in front with TLS for remote access
Token rotation	Rotate user tokens periodically via admin UI
Firewall	Restrict port 8080 to internal network only
Usage tracking	Set `TABBY_DISABLE_USAGE_COLLECTION=1` for privacy
Air-gap option	Deploy fully offline — no external calls needed

Troubleshooting

# Server won't start — check logs
docker logs tabby

# Out of memory — switch to smaller model
docker run ... tabbyml/tabby serve --model TabbyML/StarCoder-1B

# GPU not detected
nvidia-smi                           # verify GPU is visible
docker run --gpus all nvidia/cuda:12.0.0-base-ubuntu20.04 nvidia-smi  # test nvidia-docker

# Slow completions on CPU
# Normal: CPU inference is slow (1-5s for 1B model)
# Fix: add a GPU, or use cloud backend in config.toml

# VS Code extension not connecting
# Check: tabby.endpoint matches your server URL
# Check: server is running (curl http://localhost:8080/v1/health)
# Check: token is correct (if auth is enabled)

# Re-index repositories
tabby scheduler --now
# Or via API: POST http://localhost:8080/v1beta/repositories/resolve