LiteLLM Cheat Sheet
Overview
LiteLLM is a Python library and proxy server that provides a unified OpenAI-compatible interface to 100+ LLM providers including OpenAI, Anthropic, Google, Azure, AWS Bedrock, Ollama, and many more. It translates OpenAI API calls to provider-specific formats, enabling applications to switch between models without code changes. The proxy server adds load balancing, rate limiting, spend tracking, and key management.
LiteLLM is used as middleware between applications and LLM providers, simplifying multi-model architectures. It supports streaming, function calling, vision, embeddings, and image generation across providers. The proxy server can be deployed as a centralized gateway for teams to manage LLM access with budgets and usage limits.
Installation
pip install litellm
# With proxy server dependencies
pip install "litellm[proxy]"
# With extra providers
pip install "litellm[extra_proxy]"
Core Usage
Python SDK
from litellm import completion
# OpenAI
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "What is RAG?"}]
)
print(response.choices[0].message.content)
# Anthropic (same interface)
response = completion(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "What is RAG?"}]
)
# Google Gemini
response = completion(
model="gemini/gemini-1.5-pro",
messages=[{"role": "user", "content": "What is RAG?"}]
)
# AWS Bedrock
response = completion(
model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "What is RAG?"}]
)
# Ollama (local)
response = completion(
model="ollama/llama3.1",
messages=[{"role": "user", "content": "What is RAG?"}],
api_base="http://localhost:11434"
)
Streaming
from litellm import completion
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Async
import asyncio
from litellm import acompletion
async def main():
response = await acompletion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
asyncio.run(main())
Embeddings
from litellm import embedding
# OpenAI
response = embedding(
model="text-embedding-3-small",
input=["What is RAG?", "Vector databases store embeddings"]
)
# Cohere
response = embedding(
model="cohere/embed-english-v3.0",
input=["Hello world"]
)
# Bedrock
response = embedding(
model="bedrock/amazon.titan-embed-text-v1",
input=["Hello"]
)
Model Naming Convention
| Provider | Format | Example |
|---|---|---|
| OpenAI | model_name | gpt-4o |
| Anthropic | model_name | claude-3-5-sonnet-20241022 |
gemini/model_name | gemini/gemini-1.5-pro | |
| Azure | azure/deployment_name | azure/gpt-4o-deploy |
| AWS Bedrock | bedrock/model_id | bedrock/anthropic.claude-3-sonnet |
| Ollama | ollama/model_name | ollama/llama3.1 |
| HuggingFace | huggingface/model_name | huggingface/bigcode/starcoder |
| Cohere | command-r-plus | command-r-plus |
| Mistral | mistral/model_name | mistral/mistral-large-latest |
| Groq | groq/model_name | groq/llama-3.1-70b-versatile |
Proxy Server
Start Proxy
# Basic start
litellm --model gpt-4o
# With config file
litellm --config config.yaml
# With port and host
litellm --config config.yaml --port 4000 --host 0.0.0.0
# Debug mode
litellm --config config.yaml --debug
Proxy Configuration
# config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: sk-...
- model_name: claude-3
litellm_params:
model: claude-3-5-sonnet-20241022
api_key: sk-ant-...
- model_name: gpt-4o
litellm_params:
model: azure/gpt-4o-eastus
api_base: https://eastus.openai.azure.com
api_key: azure-key-1
- model_name: gpt-4o
litellm_params:
model: azure/gpt-4o-westus
api_base: https://westus.openai.azure.com
api_key: azure-key-2
- model_name: local-llama
litellm_params:
model: ollama/llama3.1
api_base: http://localhost:11434
litellm_settings:
drop_params: true
set_verbose: false
num_retries: 3
request_timeout: 600
router_settings:
routing_strategy: least-busy
num_retries: 3
timeout: 120
allowed_fails: 3
Use Proxy as OpenAI Drop-In
from openai import OpenAI
# Point to LiteLLM proxy
client = OpenAI(
api_key="sk-anything", # Proxy key
base_url="http://localhost:4000"
)
response = client.chat.completions.create(
model="gpt-4o", # Routes to configured backend
messages=[{"role": "user", "content": "Hello!"}]
)
# cURL
curl -X POST http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-anything" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Load Balancing and Routing
# config.yaml - Multiple deployments for same model
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: sk-key-1
model_info:
id: openai-1
- model_name: gpt-4o
litellm_params:
model: azure/gpt-4o-deploy
api_base: https://myazure.openai.azure.com
api_key: azure-key-1
model_info:
id: azure-1
router_settings:
routing_strategy: least-busy # simple-shuffle, least-busy, latency-based, cost-based
num_retries: 3
retry_after: 5
timeout: 120
fallbacks: [{"gpt-4o": ["claude-3"]}]
Configuration
Key Management
# config.yaml
general_settings:
master_key: sk-master-key-123
database_url: postgresql://user:pass@localhost:5432/litellm
# Virtual keys with budgets
# Create via API:
# curl -X POST http://localhost:4000/key/generate \
# -H "Authorization: Bearer sk-master-key-123" \
# -d '{"max_budget": 100, "user_id": "user@company.com"}'
Spend Tracking
# Get spend report
curl http://localhost:4000/spend/logs \
-H "Authorization: Bearer sk-master-key-123"
# Get spend by model
curl http://localhost:4000/spend/models \
-H "Authorization: Bearer sk-master-key-123"
# Get spend by key
curl http://localhost:4000/spend/keys \
-H "Authorization: Bearer sk-master-key-123"
Environment Variables
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AZURE_API_KEY=azure-...
AZURE_API_BASE=https://myazure.openai.azure.com
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=secret
AWS_REGION_NAME=us-east-1
LITELLM_MASTER_KEY=sk-master-key
DATABASE_URL=postgresql://user:pass@host:5432/litellm
Advanced Usage
Fallbacks and Retries
from litellm import completion
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
fallbacks=["claude-3-5-sonnet-20241022", "gemini/gemini-1.5-pro"],
num_retries=3
)
Cost Tracking
from litellm import completion, completion_cost
response = completion(model="gpt-4o", messages=[{"role": "user", "content": "Hello"}])
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
Docker Deployment
# docker-compose.yml
version: '3.8'
services:
litellm:
image: ghcr.io/berriai/litellm:main-latest
ports:
- "4000:4000"
environment:
- LITELLM_MASTER_KEY=sk-master-key
- DATABASE_URL=postgresql://postgres:password@db:5432/litellm
volumes:
- ./config.yaml:/app/config.yaml
command: --config /app/config.yaml --port 4000
db:
image: postgres:16
environment:
- POSTGRES_PASSWORD=password
- POSTGRES_DB=litellm
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
Troubleshooting
| Issue | Solution |
|---|---|
| API key not found | Set env var for provider (e.g., OPENAI_API_KEY) |
| Model not supported | Check model naming convention (e.g., gemini/model) |
| Timeout errors | Increase request_timeout in config |
| Rate limit errors | Add multiple deployments, enable load balancing |
| Proxy won’t start | Check config.yaml syntax, verify port is free |
| Streaming breaks | Update litellm: pip install -U litellm |
| Cost tracking wrong | Ensure model pricing is up to date |
| Auth failures | Check master_key matches request header |
# Test connectivity
litellm --test
# List supported models
python -c "import litellm; print(litellm.model_list)"
# Debug mode
litellm --config config.yaml --debug --detailed_debug