RAGFlow Cheat Sheet

Overview

RAGFlow is an open-source retrieval-augmented generation engine developed by InfiniFlow, focused on deep document understanding rather than simple text extraction. It processes PDFs, Word documents, Excel sheets, images, and presentations by analyzing document layout — identifying titles, paragraphs, tables, figures, and headers — before chunking. This layout-aware parsing substantially improves retrieval quality for complex documents that would be mangled by naive text splitting.

The system is organized around knowledge bases (document collections with associated chunk strategies and embedding models) and chat assistants (LLM configurations, system prompts, and retrieval parameters). Users can create multiple knowledge bases with different chunking strategies appropriate to their document types, then configure assistants that query those bases.

RAGFlow ships with a built-in web interface for document management, chat, and analytics, plus a comprehensive REST API for programmatic integration. It supports dozens of LLM providers (OpenAI, Anthropic, Ollama, Gemini, Azure OpenAI, and more) and embedding models, configurable per knowledge base. The entire stack runs via Docker Compose.

Installation

Docker Compose (Standard)

# Clone repository
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker

# Start with slim image (no embedded models)
docker compose -f docker-compose.yml up -d

# Start with full image (includes local OCR and embedding models)
docker compose -f docker-compose-gpu.yml up -d

# Verify services are running
docker compose ps
# Services: ragflow, elasticsearch, minio, redis, mysql

# Access web interface
open http://localhost

# View logs
docker compose logs -f ragflow

Docker Compose Configuration

# Edit environment before first start
cat docker/.env

# Key settings in docker/.env
RAGFLOW_IMAGE=infiniflow/ragflow:latest
ES_PORT=1200           # Elasticsearch port
MYSQL_PASSWORD=        # Set a strong password
MINIO_ROOT_PASSWORD=   # Set a strong password

# For GPU support (Nvidia)
# Requires NVIDIA Container Toolkit
docker compose -f docker-compose-gpu.yml up -d

# Scale workers for high throughput
docker compose up -d --scale ragflow=2

Resource Requirements

Minimum:    4 CPU cores, 16 GB RAM, 50 GB disk
Recommended: 8 CPU cores, 32 GB RAM, 200 GB disk (for document storage)
GPU:        Optional — accelerates local OCR and embedding models

REST API Client Setup

pip install requests         # Standard HTTP
pip install ragflow-sdk      # Official Python SDK (if available)

import requests

RAGFLOW_BASE = "http://localhost"
API_KEY      = "ragflow-xxxxxxxxxxxx"   # Created in web UI under API settings

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

Configuration

LLM Provider Setup (Web UI or API)

# Configure LLM via API
import requests

# Add OpenAI API key
resp = requests.post(
    f"{RAGFLOW_BASE}/api/llm",
    headers=headers,
    json={
        "llm_factory": "OpenAI",
        "model_type":  "chat",
        "model_name":  "gpt-4o-mini",
        "api_key":     "sk-...",
        "api_base":    "https://api.openai.com/v1"
    }
)

# Add Ollama (local)
resp = requests.post(
    f"{RAGFLOW_BASE}/api/llm",
    headers=headers,
    json={
        "llm_factory": "Ollama",
        "model_type":  "chat",
        "model_name":  "llama3.2",
        "api_base":    "http://host.docker.internal:11434"
    }
)

# Add embedding model
resp = requests.post(
    f"{RAGFLOW_BASE}/api/llm",
    headers=headers,
    json={
        "llm_factory": "OpenAI",
        "model_type":  "embedding",
        "model_name":  "text-embedding-3-small",
        "api_key":     "sk-..."
    }
)

Core Commands/API

Endpoint	Method	Description
`/api/new_conversation`	GET	Start a new chat conversation
`/api/completion`	POST	Send message, get streamed response
`/api/dataset`	POST	Create a knowledge base (dataset)
`/api/dataset`	GET	List all knowledge bases
`/api/dataset/{id}`	PUT	Update knowledge base settings
`/api/dataset/{id}`	DELETE	Delete a knowledge base
`/api/dataset/{id}/document`	POST	Upload documents to knowledge base
`/api/dataset/{id}/document`	GET	List documents in knowledge base
`/api/dataset/{id}/document/{doc_id}`	DELETE	Delete a document
`/api/dataset/{id}/chunk`	POST	Manually trigger chunking
`/api/dataset/{id}/chunk`	GET	List chunks in knowledge base
`/api/retrieval`	POST	Test retrieval against a knowledge base
`/api/assistant`	POST	Create a chat assistant
`/api/assistant`	GET	List all assistants
`/api/assistant/{id}`	PUT	Update assistant settings
`/api/llm`	GET	List configured LLM providers
`/api/llm`	POST	Add a new LLM provider
`/api/user/info`	GET	Get current user info and API key

Advanced Usage

Knowledge Base Management

import requests, json

BASE    = "http://localhost"
HEADERS = {"Authorization": "Bearer ragflow-xxxx", "Content-Type": "application/json"}

# Create knowledge base
def create_knowledge_base(name: str, description: str = "", chunk_method: str = "naive"):
    """
    chunk_method options:
      naive       — Simple fixed-length chunking
      book        — Book/chapter-aware chunking
      paper       — Academic paper structure
      laws        — Legal document parsing
      presentation — Slide-deck parsing
      table       — Spreadsheet/table extraction
      one         — Each page/section as one chunk
      qa          — Q&A pair extraction
      knowledge_graph — Extract entity relationships
    """
    resp = requests.post(f"{BASE}/api/dataset", headers=HEADERS, json={
        "name":           name,
        "description":    description,
        "chunk_method":   chunk_method,
        "embedding_model": "text-embedding-3-small@OpenAI",
        "permission":     "me",
        "language":       "English"
    })
    return resp.json()["data"]

# Upload a document
def upload_document(dataset_id: str, file_path: str):
    with open(file_path, "rb") as f:
        resp = requests.post(
            f"{BASE}/api/dataset/{dataset_id}/document",
            headers={"Authorization": f"Bearer ragflow-xxxx"},
            files={"file": (file_path.split("/")[-1], f)}
        )
    return resp.json()

# Trigger parsing and chunking
def parse_documents(dataset_id: str, document_ids: list[str]):
    resp = requests.post(
        f"{BASE}/api/dataset/{dataset_id}/chunk",
        headers=HEADERS,
        json={"document_ids": document_ids}
    )
    return resp.json()

# Check chunking status
def get_chunks(dataset_id: str, document_id: str = None):
    params = {"document_id": document_id} if document_id else {}
    resp = requests.get(
        f"{BASE}/api/dataset/{dataset_id}/chunk",
        headers=HEADERS,
        params=params
    )
    return resp.json()["data"]

Chat Assistant Configuration

# Create an assistant
def create_assistant(name: str, dataset_ids: list[str], llm_model: str = "gpt-4o-mini@OpenAI"):
    resp = requests.post(f"{BASE}/api/assistant", headers=HEADERS, json={
        "name":         name,
        "description":  "RAG assistant",
        "dataset_ids":  dataset_ids,
        "llm_id":       llm_model,
        "prompt_config": {
            "system":    "You are a helpful assistant. Answer questions based on provided context only.",
            "similarity_threshold": 0.2,
            "keywords_similarity_weight": 0.3,  # Weight for keyword vs vector similarity
            "top_n":     6,       # Number of chunks to retrieve
            "rerank_id": None     # Set to a reranker model ID to enable
        },
        "prompt_type": "simple"   # simple | advanced
    })
    return resp.json()["data"]

# Chat with assistant
def chat(assistant_id: str, question: str, session_id: str = None):
    resp = requests.post(f"{BASE}/api/completion", headers=HEADERS, json={
        "conversation_id": session_id,
        "messages": [{"role": "user", "content": question}],
        "quote":    True,     # Return source chunks
        "stream":   False
    })
    result = resp.json()
    return {
        "answer":  result["data"]["answer"],
        "sources": result["data"].get("reference", {}).get("chunks", [])
    }

# Start a new conversation session
def new_conversation(assistant_id: str) -> str:
    resp = requests.get(
        f"{BASE}/api/new_conversation",
        headers=HEADERS,
        params={"user_id": assistant_id}
    )
    return resp.json()["data"]["id"]

Retrieval Testing

# Test retrieval quality before building an assistant
def test_retrieval(dataset_ids: list[str], query: str, top_k: int = 5):
    resp = requests.post(f"{BASE}/api/retrieval", headers=HEADERS, json={
        "question":    query,
        "datasets":    dataset_ids,
        "top_k":       top_k,
        "similarity_threshold": 0.1,
        "rerank_id":   None
    })
    chunks = resp.json()["data"]["chunks"]
    for chunk in chunks:
        print(f"[{chunk['similarity']:.3f}] {chunk['content'][:120]}")
        print(f"  Source: {chunk['document_name']} p.{chunk.get('positions', ['?'])[0]}")
        print()
    return chunks

Chunking Strategies by Document Type

CHUNK_STRATEGIES = {
    # General documents — balanced approach
    "general":      "naive",
    # Long-form books with chapters
    "books":        "book",
    # Scientific papers with abstract/methods/results structure
    "papers":       "paper",
    # Legal documents with numbered clauses
    "legal":        "laws",
    # PowerPoint presentations slide-by-slide
    "slides":       "presentation",
    # FAQ documents — extracts Q&A pairs
    "faq":          "qa",
    # One chunk per page (preserves page context)
    "scanned_docs": "one",
    # Extract table data row by row
    "spreadsheets": "table"
}

def ingest_by_type(files_by_type: dict[str, list[str]]):
    results = {}
    for file_type, file_paths in files_by_type.items():
        method = CHUNK_STRATEGIES.get(file_type, "naive")
        kb = create_knowledge_base(f"{file_type}_kb", chunk_method=method)
        kb_id = kb["id"]

        doc_ids = []
        for path in file_paths:
            doc = upload_document(kb_id, path)
            doc_ids.append(doc["id"])

        parse_documents(kb_id, doc_ids)
        results[file_type] = kb_id

    return results

Common Workflows

Full Pipeline: Upload → Parse → Chat

def build_rag_from_documents(
    name: str,
    file_paths: list[str],
    chunk_method: str = "naive",
    llm: str = "gpt-4o-mini@OpenAI"
) -> dict:
    """End-to-end: create KB, upload docs, parse, create assistant."""
    import time

    # 1. Create knowledge base
    kb = create_knowledge_base(name, chunk_method=chunk_method)
    kb_id = kb["id"]
    print(f"Created KB: {kb_id}")

    # 2. Upload documents
    doc_ids = []
    for path in file_paths:
        doc = upload_document(kb_id, path)
        doc_ids.append(doc["data"]["id"])
        print(f"Uploaded: {path}")

    # 3. Start parsing
    parse_documents(kb_id, doc_ids)
    print("Parsing started...")

    # 4. Wait for parsing to complete
    for _ in range(30):
        time.sleep(10)
        chunks = get_chunks(kb_id)
        total = sum(c.get("chunk_num", 0) for c in chunks.get("docs", []))
        done  = sum(1 for d in chunks.get("docs", []) if d.get("progress", 0) >= 1.0)
        print(f"Progress: {done}/{len(doc_ids)} docs, {total} chunks")
        if done == len(doc_ids):
            break

    # 5. Create assistant
    assistant = create_assistant(name + "_assistant", [kb_id], llm_model=llm)
    print(f"Created assistant: {assistant['id']}")

    return {"kb_id": kb_id, "assistant_id": assistant["id"]}

# Usage
stack = build_rag_from_documents(
    name="annual_reports",
    file_paths=["report_2023.pdf", "report_2024.pdf"],
    chunk_method="paper"
)

# Chat
session = new_conversation(stack["assistant_id"])
result  = chat(stack["assistant_id"], "What was the revenue growth from 2023 to 2024?", session)
print(result["answer"])

Monitor Document Status

# Check parsing progress via API
curl -H "Authorization: Bearer ragflow-xxxx" \
  "http://localhost/api/dataset/{dataset_id}/document?page=1&page_size=20" | jq '.data.docs[].progress'

# View container resource usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Tips and Best Practices

Tip	Details
Choose chunk method by document type	`paper` for PDFs, `qa` for FAQ docs, `table` for spreadsheets — `naive` is the fallback
Allow parsing time	Complex PDFs with OCR can take 30-120 seconds per page; don’t assume instant availability
Test retrieval before building assistant	Use the `/api/retrieval` endpoint to verify chunk quality before wiring up an LLM
Set `similarity_threshold` conservatively	Start at 0.1-0.2; too high filters relevant chunks, too low adds noise
Use `quote: true` in chat requests	Returns source chunks with answers, enabling attribution and verification
Store API keys in `.env` not code	RAGFlow reads from its own `.env` file; add model keys there rather than in UI for reproducibility
Monitor Elasticsearch heap	ES is the primary bottleneck; ensure it has at least 50% of available RAM
Use `top_n: 6-10` for complex questions	More chunks give the LLM broader context; reduce for factual lookup questions
Back up MinIO and MySQL regularly	Documents (MinIO) and metadata (MySQL) are the critical persistence layers
Use `knowledge_graph` method for connected data	Entity-relationship extraction improves queries that span multiple concepts