AnythingLLM Cheat Sheet
Overview
AnythingLLM is a full-stack open-source application that lets you build a private, local RAG system with no coding required. It bundles a document ingestion pipeline, vector database, chat interface, and REST API into a single desktop app (Windows, macOS, Linux) or Docker container. Documents are embedded locally or via API, stored in an embedded vector store (LanceDB by default), and queried through configurable AI assistants called workspaces.
The core concept is the workspace — an isolated collection of documents with its own LLM selection, system prompt, chat history, and retrieval settings. Multiple workspaces can share the same underlying LLM while using completely different document sets, making it practical to maintain separate assistants for different projects or teams.
AnythingLLM supports dozens of LLM providers out of the box: local models via Ollama, LM Studio, LocalAI, and KoboldCPP, plus API-based providers including OpenAI, Anthropic, Gemini, Azure OpenAI, Mistral, and Cohere. The Agent Mode adds tool use — web search, code execution, and file operations — without any additional configuration.
Installation
Desktop App
# Download from official site
# https://anythingllm.com/download
# macOS
brew install --cask anythingllm
# Windows — download installer from website
# Runs as a native Electron application
# Data stored in: ~/Library/Application Support/anythingllm (macOS)
# %APPDATA%\anythingllm (Windows)
# ~/.config/anythingllm (Linux)
Docker (Server Mode)
# Quick start — data persists in ./anythingllm-storage
docker run -d \
-p 3001:3001 \
-v $(pwd)/anythingllm-storage:/app/server/storage \
-e STORAGE_DIR=/app/server/storage \
--name anythingllm \
mintplexlabs/anythingllm:latest
# With pre-configured LLM (OpenAI)
docker run -d \
-p 3001:3001 \
-v $(pwd)/storage:/app/server/storage \
-e STORAGE_DIR=/app/server/storage \
-e LLM_PROVIDER=openai \
-e OPEN_AI_KEY=sk-... \
-e OPEN_MODEL_PREF=gpt-4o-mini \
-e EMBEDDING_ENGINE=openai \
-e OPEN_AI_KEY=sk-... \
-e EMBEDDING_MODEL_PREF=text-embedding-3-small \
--name anythingllm \
mintplexlabs/anythingllm
# Access web interface
open http://localhost:3001
Docker Compose
version: "3.9"
services:
anythingllm:
image: mintplexlabs/anythingllm:latest
ports:
- "3001:3001"
volumes:
- ./storage:/app/server/storage
environment:
STORAGE_DIR: /app/server/storage
LLM_PROVIDER: ollama
OLLAMA_BASE_PATH: http://host.docker.internal:11434
OLLAMA_MODEL_PREF: llama3.2
OLLAMA_EMBEDDING_MODEL_PREF: nomic-embed-text
EMBEDDING_ENGINE: ollama
VECTOR_DB: lancedb # lancedb | chromadb | qdrant | weaviate | pinecone
JWT_SECRET: your-random-secret-here
restart: unless-stopped
Build from Source
git clone https://github.com/Mintplex-Labs/anything-llm.git
cd anything-llm
# Install dependencies
yarn install
yarn setup
# Start in development mode
yarn dev:server # API server on :3001
yarn dev:frontend # React frontend on :3000
# Build for production
yarn build && yarn start:server
Configuration
Environment Variables
# LLM Provider (choose one)
LLM_PROVIDER=openai # openai | anthropic | ollama | lmstudio | localai | gemini | azure | mistral | cohere
# OpenAI
OPEN_AI_KEY=sk-...
OPEN_MODEL_PREF=gpt-4o-mini
# Anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL_PREF=claude-3-5-haiku-20241022
# Ollama (local)
OLLAMA_BASE_PATH=http://localhost:11434
OLLAMA_MODEL_PREF=llama3.2
# LM Studio
LM_STUDIO_BASE_PATH=http://localhost:1234/v1
LM_STUDIO_MODEL_PREF=your-model-name
# Embedding (choose one)
EMBEDDING_ENGINE=openai # openai | ollama | native | cohere
OPEN_AI_KEY=sk-...
EMBEDDING_MODEL_PREF=text-embedding-3-small
# Vector Database
VECTOR_DB=lancedb # lancedb | chromadb | qdrant | weaviate | pinecone | zilliz
# ChromaDB
CHROMA_ENDPOINT=http://localhost:8000
CHROMA_API_HEADER=Authorization
CHROMA_API_KEY=
# Qdrant
QDRANT_ENDPOINT=http://localhost:6333
QDRANT_API_KEY=
# Security
JWT_SECRET=random-secret-32-chars
AUTH_TOKEN=your-single-user-password # Enables auth on web UI
# Storage
STORAGE_DIR=/app/server/storage
Core Commands/API
| API Endpoint | Method | Description |
|---|---|---|
/api/auth | POST | Authenticate, get session token |
/api/v1/workspaces | GET | List all workspaces |
/api/v1/workspace/new | POST | Create a new workspace |
/api/v1/workspace/{slug} | GET | Get workspace details |
/api/v1/workspace/{slug} | DELETE | Delete workspace and all data |
/api/v1/workspace/{slug}/update | POST | Update workspace settings |
/api/v1/workspace/{slug}/chat | POST | Send a chat message |
/api/v1/workspace/{slug}/chats | GET | Get chat history |
/api/v1/workspace/{slug}/reset-chat | POST | Clear chat history |
/api/v1/workspace/{slug}/upload | POST | Upload document to workspace |
/api/v1/workspace/{slug}/update-embeddings | POST | Add/remove embedded documents |
/api/v1/document/upload | POST | Upload to raw document storage |
/api/v1/documents | GET | List all uploaded documents |
/api/v1/document/{docname} | DELETE | Delete a raw document |
/api/v1/system | GET | Get system info and LLM settings |
/api/v1/system/update-env | POST | Update environment settings |
/api/v1/admin/users | GET | List all users (multi-user mode) |
/api/v1/admin/invite/new | POST | Create user invite |
/api/v1/openai/chat/completions | POST | OpenAI-compatible chat endpoint |
Advanced Usage
API Authentication and Workspace Creation
import requests
BASE = "http://localhost:3001"
# Authenticate
def get_token(password: str) -> str:
resp = requests.post(f"{BASE}/api/auth", json={"password": password})
resp.raise_for_status()
return resp.json()["token"]
TOKEN = get_token("your-auth-password")
HEADERS = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}
# Create workspace
def create_workspace(name: str, settings: dict = None) -> dict:
payload = {"name": name}
if settings:
payload.update(settings)
resp = requests.post(f"{BASE}/api/v1/workspace/new", headers=HEADERS, json=payload)
resp.raise_for_status()
return resp.json()["workspace"]
workspace = create_workspace("my-docs", {
"openAiTemp": 0.2, # LLM temperature
"openAiHistory": 10, # Messages in context window
"openAiPrompt": "You are a helpful assistant. Answer only from provided context.",
"similarityThreshold": 0.25, # Minimum similarity for retrieval
"topN": 4 # Number of chunks to retrieve
})
slug = workspace["slug"]
print(f"Created workspace: {slug}")
Document Upload and Embedding
import os
def upload_document(file_path: str) -> str:
"""Upload a file to AnythingLLM raw storage. Returns docname."""
with open(file_path, "rb") as f:
filename = os.path.basename(file_path)
resp = requests.post(
f"{BASE}/api/v1/document/upload",
headers={"Authorization": f"Bearer {TOKEN}"},
files={"file": (filename, f)}
)
resp.raise_for_status()
doc = resp.json()["documents"][0]
return doc["location"] # docname for embedding
def embed_documents(workspace_slug: str, docnames: list[str]):
"""Add uploaded documents to workspace embedding."""
resp = requests.post(
f"{BASE}/api/v1/workspace/{workspace_slug}/update-embeddings",
headers=HEADERS,
json={"adds": docnames, "deletes": []}
)
resp.raise_for_status()
return resp.json()
# Full pipeline
def ingest_files(workspace_slug: str, file_paths: list[str]):
docnames = []
for path in file_paths:
print(f"Uploading {path}...")
docname = upload_document(path)
docnames.append(docname)
print(f" Stored as: {docname}")
print(f"Embedding {len(docnames)} documents...")
result = embed_documents(workspace_slug, docnames)
print(f"Embedded: {result}")
return docnames
# Usage
ingest_files(slug, ["report.pdf", "notes.txt", "data.csv"])
Chat with Workspace
def chat(workspace_slug: str, message: str, mode: str = "chat") -> dict:
"""
mode: 'chat' (RAG with context) | 'query' (strict RAG only)
'agent' (tool-use mode)
"""
resp = requests.post(
f"{BASE}/api/v1/workspace/{workspace_slug}/chat",
headers=HEADERS,
json={"message": message, "mode": mode}
)
resp.raise_for_status()
data = resp.json()
return {
"answer": data["textResponse"],
"sources": data.get("sources", []),
"close": data.get("close", False)
}
# Simple Q&A
result = chat(slug, "What are the main findings in the report?")
print(result["answer"])
print(f"Sources: {[s['title'] for s in result['sources']]}")
# Agent mode (web search, code execution)
result = chat(slug, "Search the web for the latest news on LLMs.", mode="agent")
print(result["answer"])
OpenAI-Compatible API
# AnythingLLM exposes an OpenAI-compatible endpoint
# Drop-in replacement for apps using OpenAI SDK
from openai import OpenAI
client = OpenAI(
api_key=TOKEN,
base_url=f"{BASE}/api/v1/openai"
)
# Chat with a workspace using standard OpenAI SDK
response = client.chat.completions.create(
model=slug, # Workspace slug is the "model"
messages=[
{"role": "user", "content": "Summarize the key points."}
]
)
print(response.choices[0].message.content)
# Works with LangChain too
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model=slug,
openai_api_key=TOKEN,
openai_api_base=f"{BASE}/api/v1/openai"
)
result = llm.invoke("What does the document say about costs?")
Multi-User Mode
# Enable multi-user via environment variable or UI
# MULTI_USER_MODE=true in .env
# Create user invite (admin only)
def create_invite() -> str:
resp = requests.post(
f"{BASE}/api/v1/admin/invite/new",
headers=HEADERS
)
return resp.json()["invite"]["link"]
# List users
def list_users() -> list:
resp = requests.get(f"{BASE}/api/v1/admin/users", headers=HEADERS)
return resp.json()["users"]
# Update user workspace permissions
def set_workspace_permissions(user_id: int, workspace_id: int):
requests.post(
f"{BASE}/api/v1/admin/workspace/{workspace_id}/update-users",
headers=HEADERS,
json={"userIds": [user_id]}
)
Common Workflows
Batch Ingest a Folder
import os
def ingest_folder(workspace_slug: str, folder_path: str,
extensions: list[str] = None):
if extensions is None:
extensions = [".pdf", ".txt", ".md", ".docx", ".csv"]
files = [
os.path.join(root, f)
for root, _, files in os.walk(folder_path)
for f in files
if os.path.splitext(f)[1].lower() in extensions
]
print(f"Found {len(files)} files")
return ingest_files(workspace_slug, files)
# Ingest all PDFs in a folder
ingest_folder(slug, "./documents/reports", extensions=[".pdf"])
Export and Reset
# Backup AnythingLLM storage directory
tar -czf anythingllm_backup_$(date +%Y%m%d).tar.gz ./storage/
# Restore
tar -xzf anythingllm_backup_20240101.tar.gz
# Reset a workspace's chat history (keep documents)
curl -X POST http://localhost:3001/api/v1/workspace/my-docs/reset-chat \
-H "Authorization: Bearer $TOKEN"
# Delete entire workspace
curl -X DELETE http://localhost:3001/api/v1/workspace/my-docs \
-H "Authorization: Bearer $TOKEN"
Tips and Best Practices
| Tip | Details |
|---|---|
| Start with Ollama for privacy | Run ollama pull llama3.2 and ollama pull nomic-embed-text for fully local, offline RAG |
| Use separate workspaces per project | Each workspace gets its own document set and system prompt — don’t mix unrelated documents |
Set a strong AUTH_TOKEN | Without auth, anyone on the network can access your documents and LLM API keys |
Adjust topN by query complexity | Factual lookups: topN=2-3; synthesis questions: topN=6-8 |
| Lower similarity threshold for recall | Start at 0.2-0.3; raise only if results are too noisy |
Use query mode for strict RAG | query mode refuses to answer if no relevant context is found; chat mode falls back to LLM knowledge |
| Agent mode requires capable models | Agent tools work best with GPT-4o, Claude 3.5+, or Llama 3.1 70B+ |
| Check storage limits | LanceDB embedding storage grows with each document; monitor ./storage/ directory size |
| Use the OpenAI-compatible API for integrations | Enables drop-in use with any tool that supports a custom OpenAI base URL |
| Update regularly | AnythingLLM releases frequently; docker pull mintplexlabs/anythingllm:latest before each restart |