コンテンツにスキップ

AutoRAG Cheat Sheet

Overview

AutoRAG is an AutoML-inspired tool for automatically optimizing retrieval-augmented generation pipelines. Rather than manually tuning RAG components, AutoRAG systematically evaluates different combinations of retrievers, rerankers, prompt strategies, and generators to find the best pipeline configuration for your specific dataset and use case. It generates evaluation reports with metrics like F1, recall, MRR, and latency.

The framework supports a YAML-based configuration system where you define the search space of components to evaluate. AutoRAG then runs experiments, benchmarks each combination against your evaluation dataset, and outputs the optimal pipeline configuration that can be deployed directly as an API server.

Installation

pip install autorag

# With all optional dependencies
pip install "autorag[all]"

# With specific vector store support
pip install "autorag[pinecone]"
pip install "autorag[chroma]"
pip install "autorag[milvus]"

# Verify installation
autorag --version

Core Concepts

Pipeline Stages

StageComponentsPurpose
RetrievalBM25, VectorDB, HybridFind relevant documents
Passage RerankingCross-encoder, Cohere, UPRRe-score passages
Passage CompressorRefine, TreeSummarizeCompress context
Prompt MakerDefault, Window, Long ContextFormat LLM input
GeneratorOpenAI, vLLM, OllamaGenerate answers

Prepare Evaluation Data

import pandas as pd

# QA dataset format
qa_data = pd.DataFrame({
    "query": [
        "What is retrieval-augmented generation?",
        "How does vector search work?",
    ],
    "generation_gt": [
        ["RAG combines retrieval with generation to ground LLM outputs."],
        ["Vector search finds similar items using embedding distances."],
    ],
    "retrieval_gt": [
        [["doc_id_1", "doc_id_2"]],
        [["doc_id_3"]],
    ]
})
qa_data.to_parquet("qa.parquet")

# Corpus format
corpus = pd.DataFrame({
    "doc_id": ["doc_id_1", "doc_id_2", "doc_id_3"],
    "contents": [
        "RAG is a technique that combines...",
        "Retrieval augmented generation uses...",
        "Vector search operates by computing...",
    ],
    "metadata": [
        {"source": "wiki"},
        {"source": "paper"},
        {"source": "docs"},
    ]
})
corpus.to_parquet("corpus.parquet")

CLI Usage

# Run optimization
autorag evaluate \
  --config config.yaml \
  --qa_data qa.parquet \
  --corpus corpus.parquet \
  --project_dir ./autorag_results

# Deploy best pipeline as API
autorag deploy \
  --project_dir ./autorag_results \
  --host 0.0.0.0 \
  --port 8000

# Run a single query against deployed pipeline
curl -X POST http://localhost:8000/v1/run \
  -H "Content-Type: application/json" \
  -d '{"query": "What is RAG?"}'

# Generate evaluation report
autorag report \
  --project_dir ./autorag_results \
  --output report.html

Configuration

Basic Config

# config.yaml
node_lines:
  - node_line_name: retrieve_node_line
    nodes:
      - node_type: retrieval
        strategy:
          metrics: [retrieval_f1, retrieval_recall]
          speed_threshold: 5
        modules:
          - module_type: bm25
          - module_type: vectordb
            embedding_model: openai
          - module_type: hybrid_rrf
            weight_range: [0.3, 0.5, 0.7]

      - node_type: passage_reranker
        strategy:
          metrics: [retrieval_f1, retrieval_recall]
        modules:
          - module_type: pass_reranker
          - module_type: cohere_reranker
            model: rerank-english-v3.0
          - module_type: cross_encoder
            model: cross-encoder/ms-marco-MiniLM-L-12-v2

  - node_line_name: generate_node_line
    nodes:
      - node_type: prompt_maker
        strategy:
          metrics: [bleu, rouge, meteor]
        modules:
          - module_type: fstring
            prompt: "Context: {retrieved_contents}\nQuestion: {query}\nAnswer:"

      - node_type: generator
        strategy:
          metrics: [bleu, rouge, meteor, sem_score]
        modules:
          - module_type: openai_llm
            llm: gpt-4o
            temperature: [0.0, 0.3, 0.7]
          - module_type: openai_llm
            llm: gpt-4o-mini
            temperature: [0.0, 0.5]

Advanced Config with All Stages

# advanced_config.yaml
node_lines:
  - node_line_name: full_pipeline
    nodes:
      - node_type: retrieval
        strategy:
          metrics: [retrieval_f1, retrieval_recall, retrieval_ndcg]
          speed_threshold: 10
          top_k: 10
        modules:
          - module_type: bm25
            top_k: [5, 10, 20]
          - module_type: vectordb
            embedding_model: openai
            top_k: [5, 10, 20]
          - module_type: hybrid_rrf
            weight_range: [0.3, 0.5, 0.7]
            top_k: [5, 10, 20]
          - module_type: hybrid_cc
            normalize_method: [mm, tmm, z, dbsf]
            weight_range: [0.3, 0.5, 0.7]

      - node_type: passage_reranker
        strategy:
          metrics: [retrieval_f1, retrieval_recall]
          top_k: 5
        modules:
          - module_type: pass_reranker
          - module_type: cohere_reranker
            model: rerank-english-v3.0
          - module_type: cross_encoder
            model: cross-encoder/ms-marco-MiniLM-L-12-v2
          - module_type: sentence_transformer_reranker
            model: BAAI/bge-reranker-v2-m3

      - node_type: passage_compressor
        strategy:
          metrics: [retrieval_token_recall]
        modules:
          - module_type: pass_compressor
          - module_type: tree_summarize
            llm: gpt-4o-mini
          - module_type: refine
            llm: gpt-4o-mini

      - node_type: prompt_maker
        strategy:
          metrics: [bleu, rouge, meteor]
        modules:
          - module_type: fstring
            prompt: |
              Use the following context to answer the question.
              Context: {retrieved_contents}
              Question: {query}
              Answer:
          - module_type: long_context_reorder

      - node_type: generator
        strategy:
          metrics: [bleu, rouge, meteor, sem_score]
        modules:
          - module_type: openai_llm
            llm: gpt-4o
            temperature: [0.0, 0.5]
          - module_type: vllm
            model: meta-llama/Llama-3.1-8B-Instruct
            temperature: [0.0, 0.3]

Python API

from autorag.evaluator import Evaluator
from autorag.deploy import Runner

# Run evaluation
evaluator = Evaluator(
    qa_data_path="qa.parquet",
    corpus_data_path="corpus.parquet",
    project_dir="./autorag_results"
)
evaluator.start_trial("config.yaml")

# Deploy best pipeline
runner = Runner.from_trial_folder("./autorag_results/0")
answer = runner.run("What is retrieval-augmented generation?")
print(answer)

# Get detailed results
result = runner.run_with_details("What is RAG?")
print(f"Answer: {result['answer']}")
print(f"Retrieved docs: {result['retrieved_contents']}")
print(f"Scores: {result['retrieval_scores']}")

Advanced Usage

Custom Evaluation Metrics

from autorag.evaluation import register_metric

@register_metric("custom_accuracy")
def custom_accuracy(
    generated: list[str],
    gt: list[list[str]],
    **kwargs
) -> list[float]:
    scores = []
    for gen, refs in zip(generated, gt):
        score = max(
            1.0 if ref.lower() in gen.lower() else 0.0
            for ref in refs
        )
        scores.append(score)
    return scores

Chunk Optimization

# Optimize chunking strategy before RAG pipeline
autorag chunk \
  --config chunk_config.yaml \
  --raw_data raw_documents/ \
  --corpus_output corpus.parquet \
  --qa_output qa.parquet
# chunk_config.yaml
chunking:
  modules:
    - module_type: token
      chunk_size: [128, 256, 512, 1024]
      chunk_overlap: [0, 32, 64]
    - module_type: sentence
      chunk_size: [3, 5, 10]
      chunk_overlap: [0, 1, 2]
    - module_type: recursive
      chunk_size: [256, 512, 1024]
      chunk_overlap: [32, 64, 128]

Environment Variables

export OPENAI_API_KEY=sk-...
export COHERE_API_KEY=...
export PINECONE_API_KEY=...
export AUTORAG_LOG_LEVEL=DEBUG

Troubleshooting

IssueSolution
Evaluation runs out of memoryReduce search space, use fewer module combinations
API key rate limitingAdd delay parameter to generator config
Parquet schema mismatchEnsure generation_gt is list of lists of strings
VectorDB index missingRun autorag index before autorag evaluate
Metric computation failsCheck ground truth format matches metric requirements
Slow evaluationUse speed_threshold to skip slow configurations
Deploy fails to loadCheck project_dir contains valid trial results
CUDA out of memorySet batch_size in embedding/reranker modules
# View trial results
autorag dashboard --project_dir ./autorag_results

# Export best config
autorag export-config \
  --project_dir ./autorag_results \
  --output best_config.yaml

# Compare trials
autorag compare \
  --project_dir ./autorag_results \
  --trials 0 1 2