Zum Inhalt springen

FlashRAG Cheat Sheet

Overview

FlashRAG is a Python toolkit designed for the reproduction and development of retrieval-augmented generation research. It provides a modular framework with unified interfaces across 32+ RAG methods, standardized evaluation on multiple benchmarks, and pre-built pipelines covering naive RAG, advanced RAG, and modular RAG architectures. The toolkit is ideal for researchers comparing RAG approaches and developing new methods.

FlashRAG includes pre-processed benchmark datasets (Natural Questions, TriviaQA, HotpotQA, etc.), a component library with swappable retrievers, generators, rerankers, and refiners, and comprehensive evaluation metrics. It supports both API-based and local models, making it accessible for academic research and prototyping.

Installation

pip install flashrag

# From source with all dependencies
git clone https://github.com/RUC-NLPIR/FlashRAG.git
cd FlashRAG
pip install -e ".[all]"

# Download pre-processed datasets
python -m flashrag.download --dataset nq --output_dir ./data
python -m flashrag.download --dataset triviaqa --output_dir ./data
python -m flashrag.download --dataset hotpotqa --output_dir ./data

Core Components

Pipeline Types

PipelineDescriptionKey Feature
SequentialPipelineStandard retrieve-then-readBaseline RAG
ConditionalPipelineAdaptive retrieval decisionSkips retrieval when not needed
BranchingPipelineMultiple retrieval pathsEnsemble strategies
LoopPipelineIterative retrievalMulti-hop reasoning
SelfRAGPipelineSelf-reflective generationCritique and refine
IRCoTPipelineChain-of-thought retrievalInterleaved reasoning

Basic Usage

from flashrag.config import Config
from flashrag.pipeline import SequentialPipeline
from flashrag.dataset import Dataset

# Load configuration
config = Config("config.yaml")

# Load dataset
dataset = Dataset(config, "nq")

# Create pipeline
pipeline = SequentialPipeline(config)

# Run evaluation
results = pipeline.run(dataset)
print(f"EM: {results['em']:.4f}")
print(f"F1: {results['f1']:.4f}")

Configuration

# config.yaml
data:
  dataset_name: nq
  dataset_path: ./data/nq
  split: test

retriever:
  retriever_type: dense
  model_path: facebook/contriever-msmarco
  corpus_path: ./corpus/wiki_2018.jsonl
  index_path: ./index/contriever_wiki
  top_k: 5
  batch_size: 64

reranker:
  reranker_type: cross_encoder
  model_path: cross-encoder/ms-marco-MiniLM-L-12-v2
  top_k: 3

generator:
  generator_type: api
  model: gpt-4o
  api_key: ${OPENAI_API_KEY}
  max_tokens: 256
  temperature: 0.0

  # Or local model
  # generator_type: hf
  # model_path: meta-llama/Llama-3.1-8B-Instruct
  # device: cuda
  # max_tokens: 256

evaluation:
  metrics: [em, f1, recall, precision]
  save_results: true
  output_dir: ./results

Retriever Configuration

Supported Retrievers

RetrieverTypeModel Examples
Dense (Bi-encoder)NeuralContriever, DPR, BGE, E5
SparseLexicalBM25, SPLADE
ColBERTLate-interactionColBERTv2
Multi-vectorNeuralColBERT, ME-BERT

Building Index

from flashrag.retriever import DenseRetriever

# Build dense index
retriever = DenseRetriever(
    model_path="facebook/contriever-msmarco",
    corpus_path="./corpus/wiki_2018.jsonl",
    batch_size=256,
    device="cuda"
)

# Build FAISS index
retriever.build_index(
    index_path="./index/contriever_wiki",
    index_type="flat",  # flat, ivf, hnsw
    gpu=True
)

# Search
results = retriever.search(
    queries=["What is machine learning?"],
    top_k=10
)

BM25 Retriever

from flashrag.retriever import BM25Retriever

bm25 = BM25Retriever(
    corpus_path="./corpus/wiki_2018.jsonl",
    index_path="./index/bm25_wiki"
)

bm25.build_index()
results = bm25.search(["What is Python?"], top_k=10)

RAG Methods

Self-RAG

from flashrag.pipeline import SelfRAGPipeline

config = Config("selfrag_config.yaml")
pipeline = SelfRAGPipeline(config)

# Self-RAG adds reflection tokens:
# [Retrieve] - decides whether to retrieve
# [IsREL] - judges relevance of retrieved passages
# [IsSUP] - checks if generation is supported
# [IsUSE] - evaluates overall utility
results = pipeline.run(dataset)

Iterative RAG (IRCoT)

from flashrag.pipeline import IRCoTPipeline

config = Config("ircot_config.yaml")
pipeline = IRCoTPipeline(config)

# IRCoT interleaves chain-of-thought with retrieval
# Each reasoning step generates a new query
results = pipeline.run(dataset, max_iterations=4)

REPLUG

from flashrag.pipeline import REPLUGPipeline

config = Config("replug_config.yaml")
pipeline = REPLUGPipeline(config)

# REPLUG ensembles over retrieved documents
# Each document generates independently, then merges
results = pipeline.run(dataset)

Advanced Usage

Custom Pipeline

from flashrag.pipeline import BasicPipeline
from flashrag.component import BaseRetriever, BaseGenerator

class CustomRAGPipeline(BasicPipeline):
    def __init__(self, config):
        super().__init__(config)
        self.retriever = self.build_retriever()
        self.generator = self.build_generator()

    def run_item(self, item):
        query = item["question"]

        # Custom retrieval logic
        docs = self.retriever.search([query], top_k=10)

        # Custom filtering
        filtered = [d for d in docs[0] if d["score"] > 0.5]

        # Custom prompt
        context = "\n".join([d["text"] for d in filtered[:5]])
        prompt = f"Answer based on context.\nContext: {context}\nQ: {query}\nA:"

        # Generate
        answer = self.generator.generate([prompt])[0]
        return {"pred": answer, "docs": filtered}

Batch Evaluation

from flashrag.config import Config
from flashrag.pipeline import SequentialPipeline
from flashrag.dataset import Dataset

datasets = ["nq", "triviaqa", "hotpotqa", "musique"]
results = {}

for ds_name in datasets:
    config = Config("config.yaml", overrides={"data.dataset_name": ds_name})
    dataset = Dataset(config, ds_name)
    pipeline = SequentialPipeline(config)
    results[ds_name] = pipeline.run(dataset)
    print(f"{ds_name}: EM={results[ds_name]['em']:.4f}, F1={results[ds_name]['f1']:.4f}")

Ablation Studies

# Compare retriever impact
for top_k in [1, 3, 5, 10, 20]:
    config = Config("config.yaml", overrides={"retriever.top_k": top_k})
    pipeline = SequentialPipeline(config)
    result = pipeline.run(dataset)
    print(f"top_k={top_k}: EM={result['em']:.4f}")

# Compare with/without reranking
for use_reranker in [True, False]:
    config = Config("config.yaml", overrides={"reranker.enabled": use_reranker})
    pipeline = SequentialPipeline(config)
    result = pipeline.run(dataset)
    print(f"Reranker={use_reranker}: EM={result['em']:.4f}")

Troubleshooting

IssueSolution
FAISS index OOMUse IVF index type instead of flat for large corpora
Slow dense retrievalBuild GPU index, increase batch size
Dataset download failsDownload manually from HuggingFace datasets
Generator timeoutIncrease max_retries and timeout in config
Corpus format errorEnsure JSONL with id, text, title fields
CUDA out of memoryReduce generator batch size or use quantized model
Metric computation errorCheck prediction format matches ground truth format
Index dimension mismatchRebuild index when changing embedding model
# Validate config
python -m flashrag.validate_config config.yaml

# Check corpus format
python -c "
import json
with open('corpus.jsonl') as f:
    line = json.loads(f.readline())
    print(line.keys())  # Should have: id, text, title
"