LiteLLM Cheat Sheet

Overview

LiteLLM is a Python library and proxy server that provides a unified OpenAI-compatible interface to 100+ LLM providers including OpenAI, Anthropic, Google, Azure, AWS Bedrock, Ollama, and many more. It translates OpenAI API calls to provider-specific formats, enabling applications to switch between models without code changes. The proxy server adds load balancing, rate limiting, spend tracking, and key management.

LiteLLM is used as middleware between applications and LLM providers, simplifying multi-model architectures. It supports streaming, function calling, vision, embeddings, and image generation across providers. The proxy server can be deployed as a centralized gateway for teams to manage LLM access with budgets and usage limits.

Installation

pip install litellm

# With proxy server dependencies
pip install "litellm[proxy]"

# With extra providers
pip install "litellm[extra_proxy]"

Core Usage

Python SDK

from litellm import completion

# OpenAI
response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is RAG?"}]
)
print(response.choices[0].message.content)

# Anthropic (same interface)
response = completion(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "What is RAG?"}]
)

# Google Gemini
response = completion(
    model="gemini/gemini-1.5-pro",
    messages=[{"role": "user", "content": "What is RAG?"}]
)

# AWS Bedrock
response = completion(
    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[{"role": "user", "content": "What is RAG?"}]
)

# Ollama (local)
response = completion(
    model="ollama/llama3.1",
    messages=[{"role": "user", "content": "What is RAG?"}],
    api_base="http://localhost:11434"
)

Streaming

from litellm import completion

response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Async

import asyncio
from litellm import acompletion

async def main():
    response = await acompletion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Embeddings

from litellm import embedding

# OpenAI
response = embedding(
    model="text-embedding-3-small",
    input=["What is RAG?", "Vector databases store embeddings"]
)

# Cohere
response = embedding(
    model="cohere/embed-english-v3.0",
    input=["Hello world"]
)

# Bedrock
response = embedding(
    model="bedrock/amazon.titan-embed-text-v1",
    input=["Hello"]
)

Model Naming Convention

Provider	Format	Example
OpenAI	`model_name`	`gpt-4o`
Anthropic	`model_name`	`claude-3-5-sonnet-20241022`
Google	`gemini/model_name`	`gemini/gemini-1.5-pro`
Azure	`azure/deployment_name`	`azure/gpt-4o-deploy`
AWS Bedrock	`bedrock/model_id`	`bedrock/anthropic.claude-3-sonnet`
Ollama	`ollama/model_name`	`ollama/llama3.1`
HuggingFace	`huggingface/model_name`	`huggingface/bigcode/starcoder`
Cohere	`command-r-plus`	`command-r-plus`
Mistral	`mistral/model_name`	`mistral/mistral-large-latest`
Groq	`groq/model_name`	`groq/llama-3.1-70b-versatile`

Proxy Server

Start Proxy

# Basic start
litellm --model gpt-4o

# With config file
litellm --config config.yaml

# With port and host
litellm --config config.yaml --port 4000 --host 0.0.0.0

# Debug mode
litellm --config config.yaml --debug

Proxy Configuration

# config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: sk-...

  - model_name: claude-3
    litellm_params:
      model: claude-3-5-sonnet-20241022
      api_key: sk-ant-...

  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o-eastus
      api_base: https://eastus.openai.azure.com
      api_key: azure-key-1

  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o-westus
      api_base: https://westus.openai.azure.com
      api_key: azure-key-2

  - model_name: local-llama
    litellm_params:
      model: ollama/llama3.1
      api_base: http://localhost:11434

litellm_settings:
  drop_params: true
  set_verbose: false
  num_retries: 3
  request_timeout: 600

router_settings:
  routing_strategy: least-busy
  num_retries: 3
  timeout: 120
  allowed_fails: 3

Use Proxy as OpenAI Drop-In

from openai import OpenAI

# Point to LiteLLM proxy
client = OpenAI(
    api_key="sk-anything",  # Proxy key
    base_url="http://localhost:4000"
)

response = client.chat.completions.create(
    model="gpt-4o",  # Routes to configured backend
    messages=[{"role": "user", "content": "Hello!"}]
)

# cURL
curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-anything" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Load Balancing and Routing

# config.yaml - Multiple deployments for same model
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: sk-key-1
    model_info:
      id: openai-1

  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o-deploy
      api_base: https://myazure.openai.azure.com
      api_key: azure-key-1
    model_info:
      id: azure-1

router_settings:
  routing_strategy: least-busy  # simple-shuffle, least-busy, latency-based, cost-based
  num_retries: 3
  retry_after: 5
  timeout: 120
  fallbacks: [{"gpt-4o": ["claude-3"]}]

Configuration

Key Management

# config.yaml
general_settings:
  master_key: sk-master-key-123
  database_url: postgresql://user:pass@localhost:5432/litellm

# Virtual keys with budgets
# Create via API:
# curl -X POST http://localhost:4000/key/generate \
#   -H "Authorization: Bearer sk-master-key-123" \
#   -d '{"max_budget": 100, "user_id": "user@company.com"}'

Spend Tracking

# Get spend report
curl http://localhost:4000/spend/logs \
  -H "Authorization: Bearer sk-master-key-123"

# Get spend by model
curl http://localhost:4000/spend/models \
  -H "Authorization: Bearer sk-master-key-123"

# Get spend by key
curl http://localhost:4000/spend/keys \
  -H "Authorization: Bearer sk-master-key-123"

Environment Variables

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AZURE_API_KEY=azure-...
AZURE_API_BASE=https://myazure.openai.azure.com
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=secret
AWS_REGION_NAME=us-east-1
LITELLM_MASTER_KEY=sk-master-key
DATABASE_URL=postgresql://user:pass@host:5432/litellm

Advanced Usage

Fallbacks and Retries

from litellm import completion

response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    fallbacks=["claude-3-5-sonnet-20241022", "gemini/gemini-1.5-pro"],
    num_retries=3
)

Cost Tracking

from litellm import completion, completion_cost

response = completion(model="gpt-4o", messages=[{"role": "user", "content": "Hello"}])

cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    environment:
      - LITELLM_MASTER_KEY=sk-master-key
      - DATABASE_URL=postgresql://postgres:password@db:5432/litellm
    volumes:
      - ./config.yaml:/app/config.yaml
    command: --config /app/config.yaml --port 4000

  db:
    image: postgres:16
    environment:
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=litellm
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Troubleshooting

Issue	Solution
API key not found	Set env var for provider (e.g., `OPENAI_API_KEY`)
Model not supported	Check model naming convention (e.g., `gemini/model`)
Timeout errors	Increase `request_timeout` in config
Rate limit errors	Add multiple deployments, enable load balancing
Proxy won’t start	Check config.yaml syntax, verify port is free
Streaming breaks	Update litellm: `pip install -U litellm`
Cost tracking wrong	Ensure model pricing is up to date
Auth failures	Check `master_key` matches request header

# Test connectivity
litellm --test

# List supported models
python -c "import litellm; print(litellm.model_list)"

# Debug mode
litellm --config config.yaml --debug --detailed_debug