AnythingLLM Cheat Sheet

Overview

AnythingLLM is a full-stack open-source application that lets you build a private, local RAG system with no coding required. It bundles a document ingestion pipeline, vector database, chat interface, and REST API into a single desktop app (Windows, macOS, Linux) or Docker container. Documents are embedded locally or via API, stored in an embedded vector store (LanceDB by default), and queried through configurable AI assistants called workspaces.

The core concept is the workspace — an isolated collection of documents with its own LLM selection, system prompt, chat history, and retrieval settings. Multiple workspaces can share the same underlying LLM while using completely different document sets, making it practical to maintain separate assistants for different projects or teams.

AnythingLLM supports dozens of LLM providers out of the box: local models via Ollama, LM Studio, LocalAI, and KoboldCPP, plus API-based providers including OpenAI, Anthropic, Gemini, Azure OpenAI, Mistral, and Cohere. The Agent Mode adds tool use — web search, code execution, and file operations — without any additional configuration.

Installation

Desktop App

# Download from official site
# https://anythingllm.com/download

# macOS
brew install --cask anythingllm

# Windows — download installer from website
# Runs as a native Electron application
# Data stored in: ~/Library/Application Support/anythingllm (macOS)
#                 %APPDATA%\anythingllm (Windows)
#                 ~/.config/anythingllm (Linux)

Docker (Server Mode)

# Quick start — data persists in ./anythingllm-storage
docker run -d \
  -p 3001:3001 \
  -v $(pwd)/anythingllm-storage:/app/server/storage \
  -e STORAGE_DIR=/app/server/storage \
  --name anythingllm \
  mintplexlabs/anythingllm:latest

# With pre-configured LLM (OpenAI)
docker run -d \
  -p 3001:3001 \
  -v $(pwd)/storage:/app/server/storage \
  -e STORAGE_DIR=/app/server/storage \
  -e LLM_PROVIDER=openai \
  -e OPEN_AI_KEY=sk-... \
  -e OPEN_MODEL_PREF=gpt-4o-mini \
  -e EMBEDDING_ENGINE=openai \
  -e OPEN_AI_KEY=sk-... \
  -e EMBEDDING_MODEL_PREF=text-embedding-3-small \
  --name anythingllm \
  mintplexlabs/anythingllm

# Access web interface
open http://localhost:3001

Docker Compose

version: "3.9"
services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    ports:
      - "3001:3001"
    volumes:
      - ./storage:/app/server/storage
    environment:
      STORAGE_DIR: /app/server/storage
      LLM_PROVIDER: ollama
      OLLAMA_BASE_PATH: http://host.docker.internal:11434
      OLLAMA_MODEL_PREF: llama3.2
      OLLAMA_EMBEDDING_MODEL_PREF: nomic-embed-text
      EMBEDDING_ENGINE: ollama
      VECTOR_DB: lancedb              # lancedb | chromadb | qdrant | weaviate | pinecone
      JWT_SECRET: your-random-secret-here
    restart: unless-stopped

Build from Source

git clone https://github.com/Mintplex-Labs/anything-llm.git
cd anything-llm

# Install dependencies
yarn install
yarn setup

# Start in development mode
yarn dev:server   # API server on :3001
yarn dev:frontend # React frontend on :3000

# Build for production
yarn build && yarn start:server

Configuration

Environment Variables

# LLM Provider (choose one)
LLM_PROVIDER=openai          # openai | anthropic | ollama | lmstudio | localai | gemini | azure | mistral | cohere

# OpenAI
OPEN_AI_KEY=sk-...
OPEN_MODEL_PREF=gpt-4o-mini

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL_PREF=claude-3-5-haiku-20241022

# Ollama (local)
OLLAMA_BASE_PATH=http://localhost:11434
OLLAMA_MODEL_PREF=llama3.2

# LM Studio
LM_STUDIO_BASE_PATH=http://localhost:1234/v1
LM_STUDIO_MODEL_PREF=your-model-name

# Embedding (choose one)
EMBEDDING_ENGINE=openai       # openai | ollama | native | cohere
OPEN_AI_KEY=sk-...
EMBEDDING_MODEL_PREF=text-embedding-3-small

# Vector Database
VECTOR_DB=lancedb             # lancedb | chromadb | qdrant | weaviate | pinecone | zilliz

# ChromaDB
CHROMA_ENDPOINT=http://localhost:8000
CHROMA_API_HEADER=Authorization
CHROMA_API_KEY=

# Qdrant
QDRANT_ENDPOINT=http://localhost:6333
QDRANT_API_KEY=

# Security
JWT_SECRET=random-secret-32-chars
AUTH_TOKEN=your-single-user-password   # Enables auth on web UI

# Storage
STORAGE_DIR=/app/server/storage

Core Commands/API

API Endpoint	Method	Description
`/api/auth`	POST	Authenticate, get session token
`/api/v1/workspaces`	GET	List all workspaces
`/api/v1/workspace/new`	POST	Create a new workspace
`/api/v1/workspace/{slug}`	GET	Get workspace details
`/api/v1/workspace/{slug}`	DELETE	Delete workspace and all data
`/api/v1/workspace/{slug}/update`	POST	Update workspace settings
`/api/v1/workspace/{slug}/chat`	POST	Send a chat message
`/api/v1/workspace/{slug}/chats`	GET	Get chat history
`/api/v1/workspace/{slug}/reset-chat`	POST	Clear chat history
`/api/v1/workspace/{slug}/upload`	POST	Upload document to workspace
`/api/v1/workspace/{slug}/update-embeddings`	POST	Add/remove embedded documents
`/api/v1/document/upload`	POST	Upload to raw document storage
`/api/v1/documents`	GET	List all uploaded documents
`/api/v1/document/{docname}`	DELETE	Delete a raw document
`/api/v1/system`	GET	Get system info and LLM settings
`/api/v1/system/update-env`	POST	Update environment settings
`/api/v1/admin/users`	GET	List all users (multi-user mode)
`/api/v1/admin/invite/new`	POST	Create user invite
`/api/v1/openai/chat/completions`	POST	OpenAI-compatible chat endpoint

Advanced Usage

API Authentication and Workspace Creation

import requests

BASE = "http://localhost:3001"

# Authenticate
def get_token(password: str) -> str:
    resp = requests.post(f"{BASE}/api/auth", json={"password": password})
    resp.raise_for_status()
    return resp.json()["token"]

TOKEN = get_token("your-auth-password")
HEADERS = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}

# Create workspace
def create_workspace(name: str, settings: dict = None) -> dict:
    payload = {"name": name}
    if settings:
        payload.update(settings)
    resp = requests.post(f"{BASE}/api/v1/workspace/new", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()["workspace"]

workspace = create_workspace("my-docs", {
    "openAiTemp": 0.2,           # LLM temperature
    "openAiHistory": 10,         # Messages in context window
    "openAiPrompt": "You are a helpful assistant. Answer only from provided context.",
    "similarityThreshold": 0.25, # Minimum similarity for retrieval
    "topN": 4                    # Number of chunks to retrieve
})
slug = workspace["slug"]
print(f"Created workspace: {slug}")

Document Upload and Embedding

import os

def upload_document(file_path: str) -> str:
    """Upload a file to AnythingLLM raw storage. Returns docname."""
    with open(file_path, "rb") as f:
        filename = os.path.basename(file_path)
        resp = requests.post(
            f"{BASE}/api/v1/document/upload",
            headers={"Authorization": f"Bearer {TOKEN}"},
            files={"file": (filename, f)}
        )
    resp.raise_for_status()
    doc = resp.json()["documents"][0]
    return doc["location"]   # docname for embedding

def embed_documents(workspace_slug: str, docnames: list[str]):
    """Add uploaded documents to workspace embedding."""
    resp = requests.post(
        f"{BASE}/api/v1/workspace/{workspace_slug}/update-embeddings",
        headers=HEADERS,
        json={"adds": docnames, "deletes": []}
    )
    resp.raise_for_status()
    return resp.json()

# Full pipeline
def ingest_files(workspace_slug: str, file_paths: list[str]):
    docnames = []
    for path in file_paths:
        print(f"Uploading {path}...")
        docname = upload_document(path)
        docnames.append(docname)
        print(f"  Stored as: {docname}")

    print(f"Embedding {len(docnames)} documents...")
    result = embed_documents(workspace_slug, docnames)
    print(f"Embedded: {result}")
    return docnames

# Usage
ingest_files(slug, ["report.pdf", "notes.txt", "data.csv"])

Chat with Workspace

def chat(workspace_slug: str, message: str, mode: str = "chat") -> dict:
    """
    mode: 'chat' (RAG with context) | 'query' (strict RAG only)
          'agent' (tool-use mode)
    """
    resp = requests.post(
        f"{BASE}/api/v1/workspace/{workspace_slug}/chat",
        headers=HEADERS,
        json={"message": message, "mode": mode}
    )
    resp.raise_for_status()
    data = resp.json()
    return {
        "answer":   data["textResponse"],
        "sources":  data.get("sources", []),
        "close":    data.get("close", False)
    }

# Simple Q&A
result = chat(slug, "What are the main findings in the report?")
print(result["answer"])
print(f"Sources: {[s['title'] for s in result['sources']]}")

# Agent mode (web search, code execution)
result = chat(slug, "Search the web for the latest news on LLMs.", mode="agent")
print(result["answer"])

OpenAI-Compatible API

# AnythingLLM exposes an OpenAI-compatible endpoint
# Drop-in replacement for apps using OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key=TOKEN,
    base_url=f"{BASE}/api/v1/openai"
)

# Chat with a workspace using standard OpenAI SDK
response = client.chat.completions.create(
    model=slug,           # Workspace slug is the "model"
    messages=[
        {"role": "user", "content": "Summarize the key points."}
    ]
)
print(response.choices[0].message.content)

# Works with LangChain too
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model=slug,
    openai_api_key=TOKEN,
    openai_api_base=f"{BASE}/api/v1/openai"
)
result = llm.invoke("What does the document say about costs?")

Multi-User Mode

# Enable multi-user via environment variable or UI
# MULTI_USER_MODE=true in .env

# Create user invite (admin only)
def create_invite() -> str:
    resp = requests.post(
        f"{BASE}/api/v1/admin/invite/new",
        headers=HEADERS
    )
    return resp.json()["invite"]["link"]

# List users
def list_users() -> list:
    resp = requests.get(f"{BASE}/api/v1/admin/users", headers=HEADERS)
    return resp.json()["users"]

# Update user workspace permissions
def set_workspace_permissions(user_id: int, workspace_id: int):
    requests.post(
        f"{BASE}/api/v1/admin/workspace/{workspace_id}/update-users",
        headers=HEADERS,
        json={"userIds": [user_id]}
    )

Common Workflows

Batch Ingest a Folder

import os

def ingest_folder(workspace_slug: str, folder_path: str,
                  extensions: list[str] = None):
    if extensions is None:
        extensions = [".pdf", ".txt", ".md", ".docx", ".csv"]

    files = [
        os.path.join(root, f)
        for root, _, files in os.walk(folder_path)
        for f in files
        if os.path.splitext(f)[1].lower() in extensions
    ]

    print(f"Found {len(files)} files")
    return ingest_files(workspace_slug, files)

# Ingest all PDFs in a folder
ingest_folder(slug, "./documents/reports", extensions=[".pdf"])

Export and Reset

# Backup AnythingLLM storage directory
tar -czf anythingllm_backup_$(date +%Y%m%d).tar.gz ./storage/

# Restore
tar -xzf anythingllm_backup_20240101.tar.gz

# Reset a workspace's chat history (keep documents)
curl -X POST http://localhost:3001/api/v1/workspace/my-docs/reset-chat \
  -H "Authorization: Bearer $TOKEN"

# Delete entire workspace
curl -X DELETE http://localhost:3001/api/v1/workspace/my-docs \
  -H "Authorization: Bearer $TOKEN"

Tips and Best Practices

Tip	Details
Start with Ollama for privacy	Run `ollama pull llama3.2` and `ollama pull nomic-embed-text` for fully local, offline RAG
Use separate workspaces per project	Each workspace gets its own document set and system prompt — don’t mix unrelated documents
Set a strong `AUTH_TOKEN`	Without auth, anyone on the network can access your documents and LLM API keys
Adjust `topN` by query complexity	Factual lookups: topN=2-3; synthesis questions: topN=6-8
Lower similarity threshold for recall	Start at 0.2-0.3; raise only if results are too noisy
Use `query` mode for strict RAG	`query` mode refuses to answer if no relevant context is found; `chat` mode falls back to LLM knowledge
Agent mode requires capable models	Agent tools work best with GPT-4o, Claude 3.5+, or Llama 3.1 70B+
Check storage limits	LanceDB embedding storage grows with each document; monitor `./storage/` directory size
Use the OpenAI-compatible API for integrations	Enables drop-in use with any tool that supports a custom OpenAI base URL
Update regularly	AnythingLLM releases frequently; `docker pull mintplexlabs/anythingllm:latest` before each restart