AutoRAG Cheat Sheet
Overview
AutoRAG is an AutoML-inspired tool for automatically optimizing retrieval-augmented generation pipelines. Rather than manually tuning RAG components, AutoRAG systematically evaluates different combinations of retrievers, rerankers, prompt strategies, and generators to find the best pipeline configuration for your specific dataset and use case. It generates evaluation reports with metrics like F1, recall, MRR, and latency.
The framework supports a YAML-based configuration system where you define the search space of components to evaluate. AutoRAG then runs experiments, benchmarks each combination against your evaluation dataset, and outputs the optimal pipeline configuration that can be deployed directly as an API server.
Installation
pip install autorag
# With all optional dependencies
pip install "autorag[all]"
# With specific vector store support
pip install "autorag[pinecone]"
pip install "autorag[chroma]"
pip install "autorag[milvus]"
# Verify installation
autorag --version
Core Concepts
Pipeline Stages
| Stage | Components | Purpose |
|---|---|---|
| Retrieval | BM25, VectorDB, Hybrid | Find relevant documents |
| Passage Reranking | Cross-encoder, Cohere, UPR | Re-score passages |
| Passage Compressor | Refine, TreeSummarize | Compress context |
| Prompt Maker | Default, Window, Long Context | Format LLM input |
| Generator | OpenAI, vLLM, Ollama | Generate answers |
Prepare Evaluation Data
import pandas as pd
# QA dataset format
qa_data = pd.DataFrame({
"query": [
"What is retrieval-augmented generation?",
"How does vector search work?",
],
"generation_gt": [
["RAG combines retrieval with generation to ground LLM outputs."],
["Vector search finds similar items using embedding distances."],
],
"retrieval_gt": [
[["doc_id_1", "doc_id_2"]],
[["doc_id_3"]],
]
})
qa_data.to_parquet("qa.parquet")
# Corpus format
corpus = pd.DataFrame({
"doc_id": ["doc_id_1", "doc_id_2", "doc_id_3"],
"contents": [
"RAG is a technique that combines...",
"Retrieval augmented generation uses...",
"Vector search operates by computing...",
],
"metadata": [
{"source": "wiki"},
{"source": "paper"},
{"source": "docs"},
]
})
corpus.to_parquet("corpus.parquet")
CLI Usage
# Run optimization
autorag evaluate \
--config config.yaml \
--qa_data qa.parquet \
--corpus corpus.parquet \
--project_dir ./autorag_results
# Deploy best pipeline as API
autorag deploy \
--project_dir ./autorag_results \
--host 0.0.0.0 \
--port 8000
# Run a single query against deployed pipeline
curl -X POST http://localhost:8000/v1/run \
-H "Content-Type: application/json" \
-d '{"query": "What is RAG?"}'
# Generate evaluation report
autorag report \
--project_dir ./autorag_results \
--output report.html
Configuration
Basic Config
# config.yaml
node_lines:
- node_line_name: retrieve_node_line
nodes:
- node_type: retrieval
strategy:
metrics: [retrieval_f1, retrieval_recall]
speed_threshold: 5
modules:
- module_type: bm25
- module_type: vectordb
embedding_model: openai
- module_type: hybrid_rrf
weight_range: [0.3, 0.5, 0.7]
- node_type: passage_reranker
strategy:
metrics: [retrieval_f1, retrieval_recall]
modules:
- module_type: pass_reranker
- module_type: cohere_reranker
model: rerank-english-v3.0
- module_type: cross_encoder
model: cross-encoder/ms-marco-MiniLM-L-12-v2
- node_line_name: generate_node_line
nodes:
- node_type: prompt_maker
strategy:
metrics: [bleu, rouge, meteor]
modules:
- module_type: fstring
prompt: "Context: {retrieved_contents}\nQuestion: {query}\nAnswer:"
- node_type: generator
strategy:
metrics: [bleu, rouge, meteor, sem_score]
modules:
- module_type: openai_llm
llm: gpt-4o
temperature: [0.0, 0.3, 0.7]
- module_type: openai_llm
llm: gpt-4o-mini
temperature: [0.0, 0.5]
Advanced Config with All Stages
# advanced_config.yaml
node_lines:
- node_line_name: full_pipeline
nodes:
- node_type: retrieval
strategy:
metrics: [retrieval_f1, retrieval_recall, retrieval_ndcg]
speed_threshold: 10
top_k: 10
modules:
- module_type: bm25
top_k: [5, 10, 20]
- module_type: vectordb
embedding_model: openai
top_k: [5, 10, 20]
- module_type: hybrid_rrf
weight_range: [0.3, 0.5, 0.7]
top_k: [5, 10, 20]
- module_type: hybrid_cc
normalize_method: [mm, tmm, z, dbsf]
weight_range: [0.3, 0.5, 0.7]
- node_type: passage_reranker
strategy:
metrics: [retrieval_f1, retrieval_recall]
top_k: 5
modules:
- module_type: pass_reranker
- module_type: cohere_reranker
model: rerank-english-v3.0
- module_type: cross_encoder
model: cross-encoder/ms-marco-MiniLM-L-12-v2
- module_type: sentence_transformer_reranker
model: BAAI/bge-reranker-v2-m3
- node_type: passage_compressor
strategy:
metrics: [retrieval_token_recall]
modules:
- module_type: pass_compressor
- module_type: tree_summarize
llm: gpt-4o-mini
- module_type: refine
llm: gpt-4o-mini
- node_type: prompt_maker
strategy:
metrics: [bleu, rouge, meteor]
modules:
- module_type: fstring
prompt: |
Use the following context to answer the question.
Context: {retrieved_contents}
Question: {query}
Answer:
- module_type: long_context_reorder
- node_type: generator
strategy:
metrics: [bleu, rouge, meteor, sem_score]
modules:
- module_type: openai_llm
llm: gpt-4o
temperature: [0.0, 0.5]
- module_type: vllm
model: meta-llama/Llama-3.1-8B-Instruct
temperature: [0.0, 0.3]
Python API
from autorag.evaluator import Evaluator
from autorag.deploy import Runner
# Run evaluation
evaluator = Evaluator(
qa_data_path="qa.parquet",
corpus_data_path="corpus.parquet",
project_dir="./autorag_results"
)
evaluator.start_trial("config.yaml")
# Deploy best pipeline
runner = Runner.from_trial_folder("./autorag_results/0")
answer = runner.run("What is retrieval-augmented generation?")
print(answer)
# Get detailed results
result = runner.run_with_details("What is RAG?")
print(f"Answer: {result['answer']}")
print(f"Retrieved docs: {result['retrieved_contents']}")
print(f"Scores: {result['retrieval_scores']}")
Advanced Usage
Custom Evaluation Metrics
from autorag.evaluation import register_metric
@register_metric("custom_accuracy")
def custom_accuracy(
generated: list[str],
gt: list[list[str]],
**kwargs
) -> list[float]:
scores = []
for gen, refs in zip(generated, gt):
score = max(
1.0 if ref.lower() in gen.lower() else 0.0
for ref in refs
)
scores.append(score)
return scores
Chunk Optimization
# Optimize chunking strategy before RAG pipeline
autorag chunk \
--config chunk_config.yaml \
--raw_data raw_documents/ \
--corpus_output corpus.parquet \
--qa_output qa.parquet
# chunk_config.yaml
chunking:
modules:
- module_type: token
chunk_size: [128, 256, 512, 1024]
chunk_overlap: [0, 32, 64]
- module_type: sentence
chunk_size: [3, 5, 10]
chunk_overlap: [0, 1, 2]
- module_type: recursive
chunk_size: [256, 512, 1024]
chunk_overlap: [32, 64, 128]
Environment Variables
export OPENAI_API_KEY=sk-...
export COHERE_API_KEY=...
export PINECONE_API_KEY=...
export AUTORAG_LOG_LEVEL=DEBUG
Troubleshooting
| Issue | Solution |
|---|---|
| Evaluation runs out of memory | Reduce search space, use fewer module combinations |
| API key rate limiting | Add delay parameter to generator config |
| Parquet schema mismatch | Ensure generation_gt is list of lists of strings |
| VectorDB index missing | Run autorag index before autorag evaluate |
| Metric computation fails | Check ground truth format matches metric requirements |
| Slow evaluation | Use speed_threshold to skip slow configurations |
| Deploy fails to load | Check project_dir contains valid trial results |
| CUDA out of memory | Set batch_size in embedding/reranker modules |
# View trial results
autorag dashboard --project_dir ./autorag_results
# Export best config
autorag export-config \
--project_dir ./autorag_results \
--output best_config.yaml
# Compare trials
autorag compare \
--project_dir ./autorag_results \
--trials 0 1 2