تخطَّ إلى المحتوى

AnythingLLM Cheat Sheet

Overview

AnythingLLM is a full-stack open-source application that lets you build a private, local RAG system with no coding required. It bundles a document ingestion pipeline, vector database, chat interface, and REST API into a single desktop app (Windows, macOS, Linux) or Docker container. Documents are embedded locally or via API, stored in an embedded vector store (LanceDB by default), and queried through configurable AI assistants called workspaces.

The core concept is the workspace — an isolated collection of documents with its own LLM selection, system prompt, chat history, and retrieval settings. Multiple workspaces can share the same underlying LLM while using completely different document sets, making it practical to maintain separate assistants for different projects or teams.

AnythingLLM supports dozens of LLM providers out of the box: local models via Ollama, LM Studio, LocalAI, and KoboldCPP, plus API-based providers including OpenAI, Anthropic, Gemini, Azure OpenAI, Mistral, and Cohere. The Agent Mode adds tool use — web search, code execution, and file operations — without any additional configuration.

Installation

Desktop App

# Download from official site
# https://anythingllm.com/download

# macOS
brew install --cask anythingllm

# Windows — download installer from website
# Runs as a native Electron application
# Data stored in: ~/Library/Application Support/anythingllm (macOS)
#                 %APPDATA%\anythingllm (Windows)
#                 ~/.config/anythingllm (Linux)

Docker (Server Mode)

# Quick start — data persists in ./anythingllm-storage
docker run -d \
  -p 3001:3001 \
  -v $(pwd)/anythingllm-storage:/app/server/storage \
  -e STORAGE_DIR=/app/server/storage \
  --name anythingllm \
  mintplexlabs/anythingllm:latest

# With pre-configured LLM (OpenAI)
docker run -d \
  -p 3001:3001 \
  -v $(pwd)/storage:/app/server/storage \
  -e STORAGE_DIR=/app/server/storage \
  -e LLM_PROVIDER=openai \
  -e OPEN_AI_KEY=sk-... \
  -e OPEN_MODEL_PREF=gpt-4o-mini \
  -e EMBEDDING_ENGINE=openai \
  -e OPEN_AI_KEY=sk-... \
  -e EMBEDDING_MODEL_PREF=text-embedding-3-small \
  --name anythingllm \
  mintplexlabs/anythingllm

# Access web interface
open http://localhost:3001

Docker Compose

version: "3.9"
services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    ports:
      - "3001:3001"
    volumes:
      - ./storage:/app/server/storage
    environment:
      STORAGE_DIR: /app/server/storage
      LLM_PROVIDER: ollama
      OLLAMA_BASE_PATH: http://host.docker.internal:11434
      OLLAMA_MODEL_PREF: llama3.2
      OLLAMA_EMBEDDING_MODEL_PREF: nomic-embed-text
      EMBEDDING_ENGINE: ollama
      VECTOR_DB: lancedb              # lancedb | chromadb | qdrant | weaviate | pinecone
      JWT_SECRET: your-random-secret-here
    restart: unless-stopped

Build from Source

git clone https://github.com/Mintplex-Labs/anything-llm.git
cd anything-llm

# Install dependencies
yarn install
yarn setup

# Start in development mode
yarn dev:server   # API server on :3001
yarn dev:frontend # React frontend on :3000

# Build for production
yarn build && yarn start:server

Configuration

Environment Variables

# LLM Provider (choose one)
LLM_PROVIDER=openai          # openai | anthropic | ollama | lmstudio | localai | gemini | azure | mistral | cohere

# OpenAI
OPEN_AI_KEY=sk-...
OPEN_MODEL_PREF=gpt-4o-mini

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL_PREF=claude-3-5-haiku-20241022

# Ollama (local)
OLLAMA_BASE_PATH=http://localhost:11434
OLLAMA_MODEL_PREF=llama3.2

# LM Studio
LM_STUDIO_BASE_PATH=http://localhost:1234/v1
LM_STUDIO_MODEL_PREF=your-model-name

# Embedding (choose one)
EMBEDDING_ENGINE=openai       # openai | ollama | native | cohere
OPEN_AI_KEY=sk-...
EMBEDDING_MODEL_PREF=text-embedding-3-small

# Vector Database
VECTOR_DB=lancedb             # lancedb | chromadb | qdrant | weaviate | pinecone | zilliz

# ChromaDB
CHROMA_ENDPOINT=http://localhost:8000
CHROMA_API_HEADER=Authorization
CHROMA_API_KEY=

# Qdrant
QDRANT_ENDPOINT=http://localhost:6333
QDRANT_API_KEY=

# Security
JWT_SECRET=random-secret-32-chars
AUTH_TOKEN=your-single-user-password   # Enables auth on web UI

# Storage
STORAGE_DIR=/app/server/storage

Core Commands/API

API EndpointMethodDescription
/api/authPOSTAuthenticate, get session token
/api/v1/workspacesGETList all workspaces
/api/v1/workspace/newPOSTCreate a new workspace
/api/v1/workspace/{slug}GETGet workspace details
/api/v1/workspace/{slug}DELETEDelete workspace and all data
/api/v1/workspace/{slug}/updatePOSTUpdate workspace settings
/api/v1/workspace/{slug}/chatPOSTSend a chat message
/api/v1/workspace/{slug}/chatsGETGet chat history
/api/v1/workspace/{slug}/reset-chatPOSTClear chat history
/api/v1/workspace/{slug}/uploadPOSTUpload document to workspace
/api/v1/workspace/{slug}/update-embeddingsPOSTAdd/remove embedded documents
/api/v1/document/uploadPOSTUpload to raw document storage
/api/v1/documentsGETList all uploaded documents
/api/v1/document/{docname}DELETEDelete a raw document
/api/v1/systemGETGet system info and LLM settings
/api/v1/system/update-envPOSTUpdate environment settings
/api/v1/admin/usersGETList all users (multi-user mode)
/api/v1/admin/invite/newPOSTCreate user invite
/api/v1/openai/chat/completionsPOSTOpenAI-compatible chat endpoint

Advanced Usage

API Authentication and Workspace Creation

import requests

BASE = "http://localhost:3001"

# Authenticate
def get_token(password: str) -> str:
    resp = requests.post(f"{BASE}/api/auth", json={"password": password})
    resp.raise_for_status()
    return resp.json()["token"]

TOKEN = get_token("your-auth-password")
HEADERS = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}

# Create workspace
def create_workspace(name: str, settings: dict = None) -> dict:
    payload = {"name": name}
    if settings:
        payload.update(settings)
    resp = requests.post(f"{BASE}/api/v1/workspace/new", headers=HEADERS, json=payload)
    resp.raise_for_status()
    return resp.json()["workspace"]

workspace = create_workspace("my-docs", {
    "openAiTemp": 0.2,           # LLM temperature
    "openAiHistory": 10,         # Messages in context window
    "openAiPrompt": "You are a helpful assistant. Answer only from provided context.",
    "similarityThreshold": 0.25, # Minimum similarity for retrieval
    "topN": 4                    # Number of chunks to retrieve
})
slug = workspace["slug"]
print(f"Created workspace: {slug}")

Document Upload and Embedding

import os

def upload_document(file_path: str) -> str:
    """Upload a file to AnythingLLM raw storage. Returns docname."""
    with open(file_path, "rb") as f:
        filename = os.path.basename(file_path)
        resp = requests.post(
            f"{BASE}/api/v1/document/upload",
            headers={"Authorization": f"Bearer {TOKEN}"},
            files={"file": (filename, f)}
        )
    resp.raise_for_status()
    doc = resp.json()["documents"][0]
    return doc["location"]   # docname for embedding

def embed_documents(workspace_slug: str, docnames: list[str]):
    """Add uploaded documents to workspace embedding."""
    resp = requests.post(
        f"{BASE}/api/v1/workspace/{workspace_slug}/update-embeddings",
        headers=HEADERS,
        json={"adds": docnames, "deletes": []}
    )
    resp.raise_for_status()
    return resp.json()

# Full pipeline
def ingest_files(workspace_slug: str, file_paths: list[str]):
    docnames = []
    for path in file_paths:
        print(f"Uploading {path}...")
        docname = upload_document(path)
        docnames.append(docname)
        print(f"  Stored as: {docname}")

    print(f"Embedding {len(docnames)} documents...")
    result = embed_documents(workspace_slug, docnames)
    print(f"Embedded: {result}")
    return docnames

# Usage
ingest_files(slug, ["report.pdf", "notes.txt", "data.csv"])

Chat with Workspace

def chat(workspace_slug: str, message: str, mode: str = "chat") -> dict:
    """
    mode: 'chat' (RAG with context) | 'query' (strict RAG only)
          'agent' (tool-use mode)
    """
    resp = requests.post(
        f"{BASE}/api/v1/workspace/{workspace_slug}/chat",
        headers=HEADERS,
        json={"message": message, "mode": mode}
    )
    resp.raise_for_status()
    data = resp.json()
    return {
        "answer":   data["textResponse"],
        "sources":  data.get("sources", []),
        "close":    data.get("close", False)
    }

# Simple Q&A
result = chat(slug, "What are the main findings in the report?")
print(result["answer"])
print(f"Sources: {[s['title'] for s in result['sources']]}")

# Agent mode (web search, code execution)
result = chat(slug, "Search the web for the latest news on LLMs.", mode="agent")
print(result["answer"])

OpenAI-Compatible API

# AnythingLLM exposes an OpenAI-compatible endpoint
# Drop-in replacement for apps using OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key=TOKEN,
    base_url=f"{BASE}/api/v1/openai"
)

# Chat with a workspace using standard OpenAI SDK
response = client.chat.completions.create(
    model=slug,           # Workspace slug is the "model"
    messages=[
        {"role": "user", "content": "Summarize the key points."}
    ]
)
print(response.choices[0].message.content)

# Works with LangChain too
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model=slug,
    openai_api_key=TOKEN,
    openai_api_base=f"{BASE}/api/v1/openai"
)
result = llm.invoke("What does the document say about costs?")

Multi-User Mode

# Enable multi-user via environment variable or UI
# MULTI_USER_MODE=true in .env

# Create user invite (admin only)
def create_invite() -> str:
    resp = requests.post(
        f"{BASE}/api/v1/admin/invite/new",
        headers=HEADERS
    )
    return resp.json()["invite"]["link"]

# List users
def list_users() -> list:
    resp = requests.get(f"{BASE}/api/v1/admin/users", headers=HEADERS)
    return resp.json()["users"]

# Update user workspace permissions
def set_workspace_permissions(user_id: int, workspace_id: int):
    requests.post(
        f"{BASE}/api/v1/admin/workspace/{workspace_id}/update-users",
        headers=HEADERS,
        json={"userIds": [user_id]}
    )

Common Workflows

Batch Ingest a Folder

import os

def ingest_folder(workspace_slug: str, folder_path: str,
                  extensions: list[str] = None):
    if extensions is None:
        extensions = [".pdf", ".txt", ".md", ".docx", ".csv"]

    files = [
        os.path.join(root, f)
        for root, _, files in os.walk(folder_path)
        for f in files
        if os.path.splitext(f)[1].lower() in extensions
    ]

    print(f"Found {len(files)} files")
    return ingest_files(workspace_slug, files)

# Ingest all PDFs in a folder
ingest_folder(slug, "./documents/reports", extensions=[".pdf"])

Export and Reset

# Backup AnythingLLM storage directory
tar -czf anythingllm_backup_$(date +%Y%m%d).tar.gz ./storage/

# Restore
tar -xzf anythingllm_backup_20240101.tar.gz

# Reset a workspace's chat history (keep documents)
curl -X POST http://localhost:3001/api/v1/workspace/my-docs/reset-chat \
  -H "Authorization: Bearer $TOKEN"

# Delete entire workspace
curl -X DELETE http://localhost:3001/api/v1/workspace/my-docs \
  -H "Authorization: Bearer $TOKEN"

Tips and Best Practices

TipDetails
Start with Ollama for privacyRun ollama pull llama3.2 and ollama pull nomic-embed-text for fully local, offline RAG
Use separate workspaces per projectEach workspace gets its own document set and system prompt — don’t mix unrelated documents
Set a strong AUTH_TOKENWithout auth, anyone on the network can access your documents and LLM API keys
Adjust topN by query complexityFactual lookups: topN=2-3; synthesis questions: topN=6-8
Lower similarity threshold for recallStart at 0.2-0.3; raise only if results are too noisy
Use query mode for strict RAGquery mode refuses to answer if no relevant context is found; chat mode falls back to LLM knowledge
Agent mode requires capable modelsAgent tools work best with GPT-4o, Claude 3.5+, or Llama 3.1 70B+
Check storage limitsLanceDB embedding storage grows with each document; monitor ./storage/ directory size
Use the OpenAI-compatible API for integrationsEnables drop-in use with any tool that supports a custom OpenAI base URL
Update regularlyAnythingLLM releases frequently; docker pull mintplexlabs/anythingllm:latest before each restart