R2R Cheat Sheet

Overview

R2R (Reason to Retrieve) is a production-ready RAG engine that provides a complete pipeline for document ingestion, chunking, embedding, hybrid search, and answer generation. It includes built-in user management, document-level permissions, knowledge graph construction, conversation memory, and observability dashboards. R2R is designed to go from prototype to production without rewriting infrastructure.

The engine exposes RESTful APIs for all operations and includes a Python SDK, JavaScript SDK, and CLI. It supports multimodal ingestion (PDFs, DOCX, HTML, images, audio), multiple vector stores (Postgres/pgvector, Qdrant), and various LLM/embedding providers. R2R also provides analytics, logging, and evaluation tools for monitoring RAG performance.

Installation

pip Install

pip install r2r

# Start with Docker (includes Postgres, Hatchet, Unstructured)
r2r serve --docker
# API at http://localhost:7272
# Dashboard at http://localhost:7273

Docker Compose

# Clone and start
git clone https://github.com/SciPhi-AI/R2R.git
cd R2R
docker compose up -d

# Or use the CLI
r2r serve --docker --config-name=default

From Source

git clone https://github.com/SciPhi-AI/R2R.git
cd R2R
pip install -e ".[all]"
r2r serve

Core Operations

CLI Usage

# Ingest documents
r2r ingest-files --file-paths /path/to/doc1.pdf /path/to/doc2.txt

# Ingest from URL
r2r ingest-files --file-paths https://example.com/report.pdf

# Search
r2r search --query "What is retrieval augmented generation?"

# RAG (search + generate)
r2r rag --query "Explain the architecture" --use-hybrid-search

# List documents
r2r documents-overview

# Delete document
r2r delete --document-id doc-uuid-here

# Health check
r2r health

Python SDK

from r2r import R2RClient

client = R2RClient("http://localhost:7272")

# Ingest files
response = client.ingest_files(
    file_paths=["report.pdf", "manual.docx"],
    metadatas=[
        {"title": "Annual Report", "category": "finance"},
        {"title": "User Manual", "category": "docs"}
    ]
)
print(f"Ingested: {response}")

# Search
results = client.search(
    query="revenue growth Q4",
    search_settings={
        "use_hybrid_search": True,
        "search_limit": 10,
        "filters": {"category": {"$eq": "finance"}}
    }
)
for r in results["results"]:
    print(f"Score: {r['score']:.3f} | {r['text'][:100]}")

# RAG query
response = client.rag(
    query="What was the revenue growth in Q4?",
    rag_generation_config={
        "model": "gpt-4o",
        "temperature": 0.1
    }
)
print(response["results"]["completion"]["choices"][0]["message"]["content"])

# Streaming RAG
for chunk in client.rag(
    query="Summarize the main findings",
    rag_generation_config={"stream": True}
):
    print(chunk, end="")

REST API

# Ingest file
curl -X POST http://localhost:7272/v2/ingest_files \
  -F "files=@document.pdf" \
  -F 'metadatas=[{"title": "My Doc"}]'

# Search
curl -X POST http://localhost:7272/v2/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "machine learning",
    "search_settings": {
      "use_hybrid_search": true,
      "search_limit": 10
    }
  }'

# RAG
curl -X POST http://localhost:7272/v2/rag \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the key findings?",
    "rag_generation_config": {
      "model": "gpt-4o",
      "temperature": 0.1
    }
  }'

# List documents
curl http://localhost:7272/v2/documents_overview

# Get document chunks
curl "http://localhost:7272/v2/document_chunks?document_id=DOC_UUID"

Configuration

r2r.toml

[completion]
provider = "openai"
model = "gpt-4o"
temperature = 0.1
max_tokens = 2048

[embedding]
provider = "openai"
model = "text-embedding-3-small"
dimension = 1536
batch_size = 128

[database]
provider = "postgres"

[ingestion]
excluded_parsers = ["mp4"]

[chunking]
provider = "unstructured"
strategy = "auto"
chunk_size = 1024
chunk_overlap = 200

[kg]  # Knowledge Graph
provider = "neo4j"
batch_size = 256

[auth]
provider = "r2r"
access_token_lifetime_in_minutes = 60
refresh_token_lifetime_in_days = 7

Environment Variables

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
R2R_POSTGRES_HOST=localhost
R2R_POSTGRES_PORT=5432
R2R_POSTGRES_USER=r2r
R2R_POSTGRES_PASSWORD=password
R2R_POSTGRES_DBNAME=r2r

User Management

from r2r import R2RClient

client = R2RClient("http://localhost:7272")

# Register user
client.register(email="user@example.com", password="securepass")

# Login
tokens = client.login(email="user@example.com", password="securepass")

# User-scoped operations (documents belong to users)
client.ingest_files(file_paths=["private_doc.pdf"])

# Admin: list users
client.users_overview()

# Admin: get user's documents
client.documents_overview(user_ids=["user-uuid"])

Knowledge Graph

# Enable KG construction
client.create_graph(
    document_ids=["doc-uuid-1", "doc-uuid-2"],
    kg_creation_settings={
        "kg_triples_extraction_prompt": "default"
    }
)

# Search with KG
results = client.search(
    query="How are entities X and Y related?",
    search_settings={
        "use_kg_search": True,
        "kg_search_type": "local"
    }
)

Advanced Usage

Hybrid Search Configuration

results = client.search(
    query="deployment architecture",
    search_settings={
        "use_hybrid_search": True,
        "hybrid_search_settings": {
            "full_text_weight": 1.0,
            "semantic_weight": 5.0,
            "full_text_limit": 200,
            "rrf_k": 50
        },
        "search_limit": 10,
        "filters": {
            "$and": [
                {"category": {"$eq": "engineering"}},
                {"year": {"$gte": 2024}}
            ]
        }
    }
)

Conversations

# Create conversation
conv = client.create_conversation()

# Multi-turn chat
response1 = client.rag(
    query="What is RAG?",
    rag_generation_config={"model": "gpt-4o"},
    conversation_id=conv["results"]["id"]
)

response2 = client.rag(
    query="How does it compare to fine-tuning?",
    rag_generation_config={"model": "gpt-4o"},
    conversation_id=conv["results"]["id"]
)

# Get conversation history
history = client.get_conversation(conv["results"]["id"])

Analytics

# Get usage analytics
analytics = client.analytics(
    filter_criteria={"search_latencies": "search_latency"},
    analysis_types={"search_latencies": ["basic_statistics", "search_latency"]}
)
print(analytics)

# Get logs
logs = client.logs()

Custom Ingestion Pipeline

from r2r import R2RClient

client = R2RClient("http://localhost:7272")

# Ingest with custom chunking
response = client.ingest_files(
    file_paths=["large_document.pdf"],
    chunking_config={
        "provider": "unstructured",
        "strategy": "hi_res",
        "chunk_size": 512,
        "chunk_overlap": 100
    }
)

Troubleshooting

Issue	Solution
Docker won’t start	Check ports 7272, 5432, 6379 are free
Ingestion fails	Check file format support, verify API keys
Search returns empty	Ensure documents are ingested, check filters
Slow ingestion	Reduce chunk_overlap, use `fast` strategy
Auth token expired	Re-login or increase token lifetime in config
KG creation slow	Reduce batch_size, use faster LLM for extraction
Postgres connection error	Check R2R_POSTGRES_* environment variables
Out of memory	Increase Docker memory limits, reduce batch sizes

# Health check
r2r health
curl http://localhost:7272/v2/health

# View server logs
r2r logs --tail 100

# Docker logs
docker compose logs -f r2r

# Reset database
r2r serve --docker --full-reset