R2R Cheat Sheet
Overview
R2R (Reason to Retrieve) is a production-ready RAG engine that provides a complete pipeline for document ingestion, chunking, embedding, hybrid search, and answer generation. It includes built-in user management, document-level permissions, knowledge graph construction, conversation memory, and observability dashboards. R2R is designed to go from prototype to production without rewriting infrastructure.
The engine exposes RESTful APIs for all operations and includes a Python SDK, JavaScript SDK, and CLI. It supports multimodal ingestion (PDFs, DOCX, HTML, images, audio), multiple vector stores (Postgres/pgvector, Qdrant), and various LLM/embedding providers. R2R also provides analytics, logging, and evaluation tools for monitoring RAG performance.
Installation
pip Install
pip install r2r
# Start with Docker (includes Postgres, Hatchet, Unstructured)
r2r serve --docker
# API at http://localhost:7272
# Dashboard at http://localhost:7273
Docker Compose
# Clone and start
git clone https://github.com/SciPhi-AI/R2R.git
cd R2R
docker compose up -d
# Or use the CLI
r2r serve --docker --config-name=default
From Source
git clone https://github.com/SciPhi-AI/R2R.git
cd R2R
pip install -e ".[all]"
r2r serve
Core Operations
CLI Usage
# Ingest documents
r2r ingest-files --file-paths /path/to/doc1.pdf /path/to/doc2.txt
# Ingest from URL
r2r ingest-files --file-paths https://example.com/report.pdf
# Search
r2r search --query "What is retrieval augmented generation?"
# RAG (search + generate)
r2r rag --query "Explain the architecture" --use-hybrid-search
# List documents
r2r documents-overview
# Delete document
r2r delete --document-id doc-uuid-here
# Health check
r2r health
Python SDK
from r2r import R2RClient
client = R2RClient("http://localhost:7272")
# Ingest files
response = client.ingest_files(
file_paths=["report.pdf", "manual.docx"],
metadatas=[
{"title": "Annual Report", "category": "finance"},
{"title": "User Manual", "category": "docs"}
]
)
print(f"Ingested: {response}")
# Search
results = client.search(
query="revenue growth Q4",
search_settings={
"use_hybrid_search": True,
"search_limit": 10,
"filters": {"category": {"$eq": "finance"}}
}
)
for r in results["results"]:
print(f"Score: {r['score']:.3f} | {r['text'][:100]}")
# RAG query
response = client.rag(
query="What was the revenue growth in Q4?",
rag_generation_config={
"model": "gpt-4o",
"temperature": 0.1
}
)
print(response["results"]["completion"]["choices"][0]["message"]["content"])
# Streaming RAG
for chunk in client.rag(
query="Summarize the main findings",
rag_generation_config={"stream": True}
):
print(chunk, end="")
REST API
# Ingest file
curl -X POST http://localhost:7272/v2/ingest_files \
-F "files=@document.pdf" \
-F 'metadatas=[{"title": "My Doc"}]'
# Search
curl -X POST http://localhost:7272/v2/search \
-H "Content-Type: application/json" \
-d '{
"query": "machine learning",
"search_settings": {
"use_hybrid_search": true,
"search_limit": 10
}
}'
# RAG
curl -X POST http://localhost:7272/v2/rag \
-H "Content-Type: application/json" \
-d '{
"query": "What are the key findings?",
"rag_generation_config": {
"model": "gpt-4o",
"temperature": 0.1
}
}'
# List documents
curl http://localhost:7272/v2/documents_overview
# Get document chunks
curl "http://localhost:7272/v2/document_chunks?document_id=DOC_UUID"
Configuration
r2r.toml
[completion]
provider = "openai"
model = "gpt-4o"
temperature = 0.1
max_tokens = 2048
[embedding]
provider = "openai"
model = "text-embedding-3-small"
dimension = 1536
batch_size = 128
[database]
provider = "postgres"
[ingestion]
excluded_parsers = ["mp4"]
[chunking]
provider = "unstructured"
strategy = "auto"
chunk_size = 1024
chunk_overlap = 200
[kg] # Knowledge Graph
provider = "neo4j"
batch_size = 256
[auth]
provider = "r2r"
access_token_lifetime_in_minutes = 60
refresh_token_lifetime_in_days = 7
Environment Variables
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
R2R_POSTGRES_HOST=localhost
R2R_POSTGRES_PORT=5432
R2R_POSTGRES_USER=r2r
R2R_POSTGRES_PASSWORD=password
R2R_POSTGRES_DBNAME=r2r
User Management
from r2r import R2RClient
client = R2RClient("http://localhost:7272")
# Register user
client.register(email="user@example.com", password="securepass")
# Login
tokens = client.login(email="user@example.com", password="securepass")
# User-scoped operations (documents belong to users)
client.ingest_files(file_paths=["private_doc.pdf"])
# Admin: list users
client.users_overview()
# Admin: get user's documents
client.documents_overview(user_ids=["user-uuid"])
Knowledge Graph
# Enable KG construction
client.create_graph(
document_ids=["doc-uuid-1", "doc-uuid-2"],
kg_creation_settings={
"kg_triples_extraction_prompt": "default"
}
)
# Search with KG
results = client.search(
query="How are entities X and Y related?",
search_settings={
"use_kg_search": True,
"kg_search_type": "local"
}
)
Advanced Usage
Hybrid Search Configuration
results = client.search(
query="deployment architecture",
search_settings={
"use_hybrid_search": True,
"hybrid_search_settings": {
"full_text_weight": 1.0,
"semantic_weight": 5.0,
"full_text_limit": 200,
"rrf_k": 50
},
"search_limit": 10,
"filters": {
"$and": [
{"category": {"$eq": "engineering"}},
{"year": {"$gte": 2024}}
]
}
}
)
Conversations
# Create conversation
conv = client.create_conversation()
# Multi-turn chat
response1 = client.rag(
query="What is RAG?",
rag_generation_config={"model": "gpt-4o"},
conversation_id=conv["results"]["id"]
)
response2 = client.rag(
query="How does it compare to fine-tuning?",
rag_generation_config={"model": "gpt-4o"},
conversation_id=conv["results"]["id"]
)
# Get conversation history
history = client.get_conversation(conv["results"]["id"])
Analytics
# Get usage analytics
analytics = client.analytics(
filter_criteria={"search_latencies": "search_latency"},
analysis_types={"search_latencies": ["basic_statistics", "search_latency"]}
)
print(analytics)
# Get logs
logs = client.logs()
Custom Ingestion Pipeline
from r2r import R2RClient
client = R2RClient("http://localhost:7272")
# Ingest with custom chunking
response = client.ingest_files(
file_paths=["large_document.pdf"],
chunking_config={
"provider": "unstructured",
"strategy": "hi_res",
"chunk_size": 512,
"chunk_overlap": 100
}
)
Troubleshooting
| Issue | Solution |
|---|---|
| Docker won’t start | Check ports 7272, 5432, 6379 are free |
| Ingestion fails | Check file format support, verify API keys |
| Search returns empty | Ensure documents are ingested, check filters |
| Slow ingestion | Reduce chunk_overlap, use fast strategy |
| Auth token expired | Re-login or increase token lifetime in config |
| KG creation slow | Reduce batch_size, use faster LLM for extraction |
| Postgres connection error | Check R2R_POSTGRES_* environment variables |
| Out of memory | Increase Docker memory limits, reduce batch sizes |
# Health check
r2r health
curl http://localhost:7272/v2/health
# View server logs
r2r logs --tail 100
# Docker logs
docker compose logs -f r2r
# Reset database
r2r serve --docker --full-reset