AutoRAG Cheat Sheet

Overview

AutoRAG is an AutoML-inspired tool for automatically optimizing retrieval-augmented generation pipelines. Rather than manually tuning RAG components, AutoRAG systematically evaluates different combinations of retrievers, rerankers, prompt strategies, and generators to find the best pipeline configuration for your specific dataset and use case. It generates evaluation reports with metrics like F1, recall, MRR, and latency.

The framework supports a YAML-based configuration system where you define the search space of components to evaluate. AutoRAG then runs experiments, benchmarks each combination against your evaluation dataset, and outputs the optimal pipeline configuration that can be deployed directly as an API server.

Installation

pip install autorag

# With all optional dependencies
pip install "autorag[all]"

# With specific vector store support
pip install "autorag[pinecone]"
pip install "autorag[chroma]"
pip install "autorag[milvus]"

# Verify installation
autorag --version

Core Concepts

Pipeline Stages

Stage	Components	Purpose
Retrieval	BM25, VectorDB, Hybrid	Find relevant documents
Passage Reranking	Cross-encoder, Cohere, UPR	Re-score passages
Passage Compressor	Refine, TreeSummarize	Compress context
Prompt Maker	Default, Window, Long Context	Format LLM input
Generator	OpenAI, vLLM, Ollama	Generate answers

Prepare Evaluation Data

import pandas as pd

# QA dataset format
qa_data = pd.DataFrame({
    "query": [
        "What is retrieval-augmented generation?",
        "How does vector search work?",
    ],
    "generation_gt": [
        ["RAG combines retrieval with generation to ground LLM outputs."],
        ["Vector search finds similar items using embedding distances."],
    ],
    "retrieval_gt": [
        [["doc_id_1", "doc_id_2"]],
        [["doc_id_3"]],
    ]
})
qa_data.to_parquet("qa.parquet")

# Corpus format
corpus = pd.DataFrame({
    "doc_id": ["doc_id_1", "doc_id_2", "doc_id_3"],
    "contents": [
        "RAG is a technique that combines...",
        "Retrieval augmented generation uses...",
        "Vector search operates by computing...",
    ],
    "metadata": [
        {"source": "wiki"},
        {"source": "paper"},
        {"source": "docs"},
    ]
})
corpus.to_parquet("corpus.parquet")

CLI Usage

# Run optimization
autorag evaluate \
  --config config.yaml \
  --qa_data qa.parquet \
  --corpus corpus.parquet \
  --project_dir ./autorag_results

# Deploy best pipeline as API
autorag deploy \
  --project_dir ./autorag_results \
  --host 0.0.0.0 \
  --port 8000

# Run a single query against deployed pipeline
curl -X POST http://localhost:8000/v1/run \
  -H "Content-Type: application/json" \
  -d '{"query": "What is RAG?"}'

# Generate evaluation report
autorag report \
  --project_dir ./autorag_results \
  --output report.html

Configuration

Basic Config

# config.yaml
node_lines:
  - node_line_name: retrieve_node_line
    nodes:
      - node_type: retrieval
        strategy:
          metrics: [retrieval_f1, retrieval_recall]
          speed_threshold: 5
        modules:
          - module_type: bm25
          - module_type: vectordb
            embedding_model: openai
          - module_type: hybrid_rrf
            weight_range: [0.3, 0.5, 0.7]

      - node_type: passage_reranker
        strategy:
          metrics: [retrieval_f1, retrieval_recall]
        modules:
          - module_type: pass_reranker
          - module_type: cohere_reranker
            model: rerank-english-v3.0
          - module_type: cross_encoder
            model: cross-encoder/ms-marco-MiniLM-L-12-v2

  - node_line_name: generate_node_line
    nodes:
      - node_type: prompt_maker
        strategy:
          metrics: [bleu, rouge, meteor]
        modules:
          - module_type: fstring
            prompt: "Context: {retrieved_contents}\nQuestion: {query}\nAnswer:"

      - node_type: generator
        strategy:
          metrics: [bleu, rouge, meteor, sem_score]
        modules:
          - module_type: openai_llm
            llm: gpt-4o
            temperature: [0.0, 0.3, 0.7]
          - module_type: openai_llm
            llm: gpt-4o-mini
            temperature: [0.0, 0.5]

Advanced Config with All Stages

# advanced_config.yaml
node_lines:
  - node_line_name: full_pipeline
    nodes:
      - node_type: retrieval
        strategy:
          metrics: [retrieval_f1, retrieval_recall, retrieval_ndcg]
          speed_threshold: 10
          top_k: 10
        modules:
          - module_type: bm25
            top_k: [5, 10, 20]
          - module_type: vectordb
            embedding_model: openai
            top_k: [5, 10, 20]
          - module_type: hybrid_rrf
            weight_range: [0.3, 0.5, 0.7]
            top_k: [5, 10, 20]
          - module_type: hybrid_cc
            normalize_method: [mm, tmm, z, dbsf]
            weight_range: [0.3, 0.5, 0.7]

      - node_type: passage_reranker
        strategy:
          metrics: [retrieval_f1, retrieval_recall]
          top_k: 5
        modules:
          - module_type: pass_reranker
          - module_type: cohere_reranker
            model: rerank-english-v3.0
          - module_type: cross_encoder
            model: cross-encoder/ms-marco-MiniLM-L-12-v2
          - module_type: sentence_transformer_reranker
            model: BAAI/bge-reranker-v2-m3

      - node_type: passage_compressor
        strategy:
          metrics: [retrieval_token_recall]
        modules:
          - module_type: pass_compressor
          - module_type: tree_summarize
            llm: gpt-4o-mini
          - module_type: refine
            llm: gpt-4o-mini

      - node_type: prompt_maker
        strategy:
          metrics: [bleu, rouge, meteor]
        modules:
          - module_type: fstring
            prompt: |
              Use the following context to answer the question.
              Context: {retrieved_contents}
              Question: {query}
              Answer:
          - module_type: long_context_reorder

      - node_type: generator
        strategy:
          metrics: [bleu, rouge, meteor, sem_score]
        modules:
          - module_type: openai_llm
            llm: gpt-4o
            temperature: [0.0, 0.5]
          - module_type: vllm
            model: meta-llama/Llama-3.1-8B-Instruct
            temperature: [0.0, 0.3]

Python API

from autorag.evaluator import Evaluator
from autorag.deploy import Runner

# Run evaluation
evaluator = Evaluator(
    qa_data_path="qa.parquet",
    corpus_data_path="corpus.parquet",
    project_dir="./autorag_results"
)
evaluator.start_trial("config.yaml")

# Deploy best pipeline
runner = Runner.from_trial_folder("./autorag_results/0")
answer = runner.run("What is retrieval-augmented generation?")
print(answer)

# Get detailed results
result = runner.run_with_details("What is RAG?")
print(f"Answer: {result['answer']}")
print(f"Retrieved docs: {result['retrieved_contents']}")
print(f"Scores: {result['retrieval_scores']}")

Advanced Usage

Custom Evaluation Metrics

from autorag.evaluation import register_metric

@register_metric("custom_accuracy")
def custom_accuracy(
    generated: list[str],
    gt: list[list[str]],
    **kwargs
) -> list[float]:
    scores = []
    for gen, refs in zip(generated, gt):
        score = max(
            1.0 if ref.lower() in gen.lower() else 0.0
            for ref in refs
        )
        scores.append(score)
    return scores

Chunk Optimization

# Optimize chunking strategy before RAG pipeline
autorag chunk \
  --config chunk_config.yaml \
  --raw_data raw_documents/ \
  --corpus_output corpus.parquet \
  --qa_output qa.parquet

# chunk_config.yaml
chunking:
  modules:
    - module_type: token
      chunk_size: [128, 256, 512, 1024]
      chunk_overlap: [0, 32, 64]
    - module_type: sentence
      chunk_size: [3, 5, 10]
      chunk_overlap: [0, 1, 2]
    - module_type: recursive
      chunk_size: [256, 512, 1024]
      chunk_overlap: [32, 64, 128]

Environment Variables

export OPENAI_API_KEY=sk-...
export COHERE_API_KEY=...
export PINECONE_API_KEY=...
export AUTORAG_LOG_LEVEL=DEBUG

Troubleshooting

Issue	Solution
Evaluation runs out of memory	Reduce search space, use fewer module combinations
API key rate limiting	Add `delay` parameter to generator config
Parquet schema mismatch	Ensure `generation_gt` is list of lists of strings
VectorDB index missing	Run `autorag index` before `autorag evaluate`
Metric computation fails	Check ground truth format matches metric requirements
Slow evaluation	Use `speed_threshold` to skip slow configurations
Deploy fails to load	Check `project_dir` contains valid trial results
CUDA out of memory	Set `batch_size` in embedding/reranker modules

# View trial results
autorag dashboard --project_dir ./autorag_results

# Export best config
autorag export-config \
  --project_dir ./autorag_results \
  --output best_config.yaml

# Compare trials
autorag compare \
  --project_dir ./autorag_results \
  --trials 0 1 2