RAGFlow Cheat Sheet
Overview
RAGFlow is an open-source retrieval-augmented generation engine developed by InfiniFlow, focused on deep document understanding rather than simple text extraction. It processes PDFs, Word documents, Excel sheets, images, and presentations by analyzing document layout — identifying titles, paragraphs, tables, figures, and headers — before chunking. This layout-aware parsing substantially improves retrieval quality for complex documents that would be mangled by naive text splitting.
The system is organized around knowledge bases (document collections with associated chunk strategies and embedding models) and chat assistants (LLM configurations, system prompts, and retrieval parameters). Users can create multiple knowledge bases with different chunking strategies appropriate to their document types, then configure assistants that query those bases.
RAGFlow ships with a built-in web interface for document management, chat, and analytics, plus a comprehensive REST API for programmatic integration. It supports dozens of LLM providers (OpenAI, Anthropic, Ollama, Gemini, Azure OpenAI, and more) and embedding models, configurable per knowledge base. The entire stack runs via Docker Compose.
Installation
Docker Compose (Standard)
# Clone repository
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
# Start with slim image (no embedded models)
docker compose -f docker-compose.yml up -d
# Start with full image (includes local OCR and embedding models)
docker compose -f docker-compose-gpu.yml up -d
# Verify services are running
docker compose ps
# Services: ragflow, elasticsearch, minio, redis, mysql
# Access web interface
open http://localhost
# View logs
docker compose logs -f ragflow
Docker Compose Configuration
# Edit environment before first start
cat docker/.env
# Key settings in docker/.env
RAGFLOW_IMAGE=infiniflow/ragflow:latest
ES_PORT=1200 # Elasticsearch port
MYSQL_PASSWORD= # Set a strong password
MINIO_ROOT_PASSWORD= # Set a strong password
# For GPU support (Nvidia)
# Requires NVIDIA Container Toolkit
docker compose -f docker-compose-gpu.yml up -d
# Scale workers for high throughput
docker compose up -d --scale ragflow=2
Resource Requirements
Minimum: 4 CPU cores, 16 GB RAM, 50 GB disk
Recommended: 8 CPU cores, 32 GB RAM, 200 GB disk (for document storage)
GPU: Optional — accelerates local OCR and embedding models
REST API Client Setup
pip install requests # Standard HTTP
pip install ragflow-sdk # Official Python SDK (if available)
import requests
RAGFLOW_BASE = "http://localhost"
API_KEY = "ragflow-xxxxxxxxxxxx" # Created in web UI under API settings
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
Configuration
LLM Provider Setup (Web UI or API)
# Configure LLM via API
import requests
# Add OpenAI API key
resp = requests.post(
f"{RAGFLOW_BASE}/api/llm",
headers=headers,
json={
"llm_factory": "OpenAI",
"model_type": "chat",
"model_name": "gpt-4o-mini",
"api_key": "sk-...",
"api_base": "https://api.openai.com/v1"
}
)
# Add Ollama (local)
resp = requests.post(
f"{RAGFLOW_BASE}/api/llm",
headers=headers,
json={
"llm_factory": "Ollama",
"model_type": "chat",
"model_name": "llama3.2",
"api_base": "http://host.docker.internal:11434"
}
)
# Add embedding model
resp = requests.post(
f"{RAGFLOW_BASE}/api/llm",
headers=headers,
json={
"llm_factory": "OpenAI",
"model_type": "embedding",
"model_name": "text-embedding-3-small",
"api_key": "sk-..."
}
)
Core Commands/API
| Endpoint | Method | Description |
|---|---|---|
/api/new_conversation | GET | Start a new chat conversation |
/api/completion | POST | Send message, get streamed response |
/api/dataset | POST | Create a knowledge base (dataset) |
/api/dataset | GET | List all knowledge bases |
/api/dataset/{id} | PUT | Update knowledge base settings |
/api/dataset/{id} | DELETE | Delete a knowledge base |
/api/dataset/{id}/document | POST | Upload documents to knowledge base |
/api/dataset/{id}/document | GET | List documents in knowledge base |
/api/dataset/{id}/document/{doc_id} | DELETE | Delete a document |
/api/dataset/{id}/chunk | POST | Manually trigger chunking |
/api/dataset/{id}/chunk | GET | List chunks in knowledge base |
/api/retrieval | POST | Test retrieval against a knowledge base |
/api/assistant | POST | Create a chat assistant |
/api/assistant | GET | List all assistants |
/api/assistant/{id} | PUT | Update assistant settings |
/api/llm | GET | List configured LLM providers |
/api/llm | POST | Add a new LLM provider |
/api/user/info | GET | Get current user info and API key |
Advanced Usage
Knowledge Base Management
import requests, json
BASE = "http://localhost"
HEADERS = {"Authorization": "Bearer ragflow-xxxx", "Content-Type": "application/json"}
# Create knowledge base
def create_knowledge_base(name: str, description: str = "", chunk_method: str = "naive"):
"""
chunk_method options:
naive — Simple fixed-length chunking
book — Book/chapter-aware chunking
paper — Academic paper structure
laws — Legal document parsing
presentation — Slide-deck parsing
table — Spreadsheet/table extraction
one — Each page/section as one chunk
qa — Q&A pair extraction
knowledge_graph — Extract entity relationships
"""
resp = requests.post(f"{BASE}/api/dataset", headers=HEADERS, json={
"name": name,
"description": description,
"chunk_method": chunk_method,
"embedding_model": "text-embedding-3-small@OpenAI",
"permission": "me",
"language": "English"
})
return resp.json()["data"]
# Upload a document
def upload_document(dataset_id: str, file_path: str):
with open(file_path, "rb") as f:
resp = requests.post(
f"{BASE}/api/dataset/{dataset_id}/document",
headers={"Authorization": f"Bearer ragflow-xxxx"},
files={"file": (file_path.split("/")[-1], f)}
)
return resp.json()
# Trigger parsing and chunking
def parse_documents(dataset_id: str, document_ids: list[str]):
resp = requests.post(
f"{BASE}/api/dataset/{dataset_id}/chunk",
headers=HEADERS,
json={"document_ids": document_ids}
)
return resp.json()
# Check chunking status
def get_chunks(dataset_id: str, document_id: str = None):
params = {"document_id": document_id} if document_id else {}
resp = requests.get(
f"{BASE}/api/dataset/{dataset_id}/chunk",
headers=HEADERS,
params=params
)
return resp.json()["data"]
Chat Assistant Configuration
# Create an assistant
def create_assistant(name: str, dataset_ids: list[str], llm_model: str = "gpt-4o-mini@OpenAI"):
resp = requests.post(f"{BASE}/api/assistant", headers=HEADERS, json={
"name": name,
"description": "RAG assistant",
"dataset_ids": dataset_ids,
"llm_id": llm_model,
"prompt_config": {
"system": "You are a helpful assistant. Answer questions based on provided context only.",
"similarity_threshold": 0.2,
"keywords_similarity_weight": 0.3, # Weight for keyword vs vector similarity
"top_n": 6, # Number of chunks to retrieve
"rerank_id": None # Set to a reranker model ID to enable
},
"prompt_type": "simple" # simple | advanced
})
return resp.json()["data"]
# Chat with assistant
def chat(assistant_id: str, question: str, session_id: str = None):
resp = requests.post(f"{BASE}/api/completion", headers=HEADERS, json={
"conversation_id": session_id,
"messages": [{"role": "user", "content": question}],
"quote": True, # Return source chunks
"stream": False
})
result = resp.json()
return {
"answer": result["data"]["answer"],
"sources": result["data"].get("reference", {}).get("chunks", [])
}
# Start a new conversation session
def new_conversation(assistant_id: str) -> str:
resp = requests.get(
f"{BASE}/api/new_conversation",
headers=HEADERS,
params={"user_id": assistant_id}
)
return resp.json()["data"]["id"]
Retrieval Testing
# Test retrieval quality before building an assistant
def test_retrieval(dataset_ids: list[str], query: str, top_k: int = 5):
resp = requests.post(f"{BASE}/api/retrieval", headers=HEADERS, json={
"question": query,
"datasets": dataset_ids,
"top_k": top_k,
"similarity_threshold": 0.1,
"rerank_id": None
})
chunks = resp.json()["data"]["chunks"]
for chunk in chunks:
print(f"[{chunk['similarity']:.3f}] {chunk['content'][:120]}")
print(f" Source: {chunk['document_name']} p.{chunk.get('positions', ['?'])[0]}")
print()
return chunks
Chunking Strategies by Document Type
CHUNK_STRATEGIES = {
# General documents — balanced approach
"general": "naive",
# Long-form books with chapters
"books": "book",
# Scientific papers with abstract/methods/results structure
"papers": "paper",
# Legal documents with numbered clauses
"legal": "laws",
# PowerPoint presentations slide-by-slide
"slides": "presentation",
# FAQ documents — extracts Q&A pairs
"faq": "qa",
# One chunk per page (preserves page context)
"scanned_docs": "one",
# Extract table data row by row
"spreadsheets": "table"
}
def ingest_by_type(files_by_type: dict[str, list[str]]):
results = {}
for file_type, file_paths in files_by_type.items():
method = CHUNK_STRATEGIES.get(file_type, "naive")
kb = create_knowledge_base(f"{file_type}_kb", chunk_method=method)
kb_id = kb["id"]
doc_ids = []
for path in file_paths:
doc = upload_document(kb_id, path)
doc_ids.append(doc["id"])
parse_documents(kb_id, doc_ids)
results[file_type] = kb_id
return results
Common Workflows
Full Pipeline: Upload → Parse → Chat
def build_rag_from_documents(
name: str,
file_paths: list[str],
chunk_method: str = "naive",
llm: str = "gpt-4o-mini@OpenAI"
) -> dict:
"""End-to-end: create KB, upload docs, parse, create assistant."""
import time
# 1. Create knowledge base
kb = create_knowledge_base(name, chunk_method=chunk_method)
kb_id = kb["id"]
print(f"Created KB: {kb_id}")
# 2. Upload documents
doc_ids = []
for path in file_paths:
doc = upload_document(kb_id, path)
doc_ids.append(doc["data"]["id"])
print(f"Uploaded: {path}")
# 3. Start parsing
parse_documents(kb_id, doc_ids)
print("Parsing started...")
# 4. Wait for parsing to complete
for _ in range(30):
time.sleep(10)
chunks = get_chunks(kb_id)
total = sum(c.get("chunk_num", 0) for c in chunks.get("docs", []))
done = sum(1 for d in chunks.get("docs", []) if d.get("progress", 0) >= 1.0)
print(f"Progress: {done}/{len(doc_ids)} docs, {total} chunks")
if done == len(doc_ids):
break
# 5. Create assistant
assistant = create_assistant(name + "_assistant", [kb_id], llm_model=llm)
print(f"Created assistant: {assistant['id']}")
return {"kb_id": kb_id, "assistant_id": assistant["id"]}
# Usage
stack = build_rag_from_documents(
name="annual_reports",
file_paths=["report_2023.pdf", "report_2024.pdf"],
chunk_method="paper"
)
# Chat
session = new_conversation(stack["assistant_id"])
result = chat(stack["assistant_id"], "What was the revenue growth from 2023 to 2024?", session)
print(result["answer"])
Monitor Document Status
# Check parsing progress via API
curl -H "Authorization: Bearer ragflow-xxxx" \
"http://localhost/api/dataset/{dataset_id}/document?page=1&page_size=20" | jq '.data.docs[].progress'
# View container resource usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
Tips and Best Practices
| Tip | Details |
|---|---|
| Choose chunk method by document type | paper for PDFs, qa for FAQ docs, table for spreadsheets — naive is the fallback |
| Allow parsing time | Complex PDFs with OCR can take 30-120 seconds per page; don’t assume instant availability |
| Test retrieval before building assistant | Use the /api/retrieval endpoint to verify chunk quality before wiring up an LLM |
Set similarity_threshold conservatively | Start at 0.1-0.2; too high filters relevant chunks, too low adds noise |
Use quote: true in chat requests | Returns source chunks with answers, enabling attribution and verification |
Store API keys in .env not code | RAGFlow reads from its own .env file; add model keys there rather than in UI for reproducibility |
| Monitor Elasticsearch heap | ES is the primary bottleneck; ensure it has at least 50% of available RAM |
Use top_n: 6-10 for complex questions | More chunks give the LLM broader context; reduce for factual lookup questions |
| Back up MinIO and MySQL regularly | Documents (MinIO) and metadata (MySQL) are the critical persistence layers |
Use knowledge_graph method for connected data | Entity-relationship extraction improves queries that span multiple concepts |