Salta ai contenuti

DSPy Cheat Sheet

Overview

DSPy (Declarative Self-improving Python) is a framework from Stanford that replaces hand-crafted prompts with composable Python modules. Instead of writing brittle prompt strings, you define signatures (input/output specs) and modules (reasoning strategies), then let optimizers automatically find the best prompts and few-shot examples for your task and model.

Key concepts: Signatures declare what a module does. Modules implement reasoning patterns (ChainOfThought, ReAct, etc.). Optimizers (formerly called teleprompters) tune the prompts/weights using a training set and metric.

Installation

# Core install
pip install dspy

# With optional extras
pip install dspy[all]          # All integrations
pip install dspy chromadb      # With vector store support
pip install dspy anthropic     # Anthropic models
pip install dspy google-generativeai  # Gemini models

# Development / latest
pip install git+https://github.com/stanfordnlp/dspy.git

Configuration

import dspy

# --- OpenAI ---
lm = dspy.LM("openai/gpt-4o-mini", api_key="sk-...")
dspy.configure(lm=lm)

# --- Anthropic ---
lm = dspy.LM("anthropic/claude-3-5-haiku-20241022", api_key="sk-ant-...")
dspy.configure(lm=lm)

# --- Local / Ollama ---
lm = dspy.LM("ollama/llama3.2", api_base="http://localhost:11434")
dspy.configure(lm=lm)

# --- Together AI ---
lm = dspy.LM("together_ai/meta-llama/Llama-3-8b-chat-hf", api_key="...")
dspy.configure(lm=lm)

# --- Multiple LMs in one program ---
fast_lm = dspy.LM("openai/gpt-4o-mini")
powerful_lm = dspy.LM("openai/gpt-4o")
dspy.configure(lm=fast_lm)   # default

# Per-module override
module = dspy.ChainOfThought("question -> answer")
module.set_lm(powerful_lm)

# Caching (enabled by default, stored in ~/.dspy_cache)
dspy.configure(lm=lm, experimental=True)   # opt into new features

Core Concepts

Signatures

Signatures define the input/output contract of a module.

# Inline string signature
qa = dspy.Predict("question -> answer")

# Class-based signature with docs and field hints
class SentimentClassifier(dspy.Signature):
    """Classify the sentiment of a product review."""
    review: str = dspy.InputField(desc="The customer review text")
    sentiment: str = dspy.OutputField(desc="positive, negative, or neutral")
    confidence: float = dspy.OutputField(desc="Confidence score 0.0-1.0")

clf = dspy.Predict(SentimentClassifier)
result = clf(review="This product changed my life!")
print(result.sentiment, result.confidence)

# Multi-hop signatures
class HopSignature(dspy.Signature):
    """Answer using retrieved context."""
    context: list[str] = dspy.InputField(desc="Retrieved passages")
    question: str = dspy.InputField()
    answer: str = dspy.OutputField(desc="Concise factual answer")

Built-in Modules

ModuleDescription
dspy.PredictDirect prediction, no chain-of-thought
dspy.ChainOfThoughtAdds reasoning field before answer
dspy.ChainOfThoughtWithHintCoT with an optional hint input
dspy.ProgramOfThoughtGenerates and executes Python code
dspy.ReActReason + Act with tool calls
dspy.MultiChainComparisonGenerates N chains, picks best
dspy.BestOfNRuns N completions, returns best
dspy.RetryRetries with feedback on failure
dspy.Assert / dspy.SuggestEnforce output constraints

Core API Reference

APIDescription
dspy.LM(model, **kwargs)Create a language model client
dspy.configure(lm=...)Set global defaults
dspy.SignatureBase class for signatures
dspy.InputField(desc=...)Declare an input field
dspy.OutputField(desc=...)Declare an output field
dspy.Predict(sig)Simplest prediction module
dspy.ChainOfThought(sig)CoT reasoning module
dspy.ReAct(sig, tools=[...])Tool-use agent module
dspy.ModuleBase class for custom modules
dspy.Example(**kwargs)Training/test data point
dspy.Evaluate(devset, metric)Run evaluation
dspy.teleprompt.*Optimizers namespace
program.save("path.json")Save compiled program
program.load("path.json")Load compiled program

Advanced Usage

Custom Modules

import dspy

class RAGPipeline(dspy.Module):
    def __init__(self, retriever, num_passages=3):
        self.retrieve = retriever
        self.num_passages = num_passages
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        passages = self.retrieve(question, k=self.num_passages)
        context = [p.long_text for p in passages]
        prediction = self.generate(context=context, question=question)
        return dspy.Prediction(answer=prediction.answer, passages=passages)

# Multi-hop reasoning pipeline
class MultiHopQA(dspy.Module):
    def __init__(self, retriever, hops=2):
        self.retrieve = retriever
        self.generate_query = [dspy.ChainOfThought("context, question -> search_query")
                               for _ in range(hops)]
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = []
        for hop in self.generate_query:
            query = hop(context=context, question=question).search_query
            passages = self.retrieve(query, k=3)
            context += [p.long_text for p in passages]
        return self.generate_answer(context=context, question=question)

ReAct Agent with Tools

import dspy

def search_web(query: str) -> str:
    """Search the web and return top results."""
    # your search implementation
    return "search results..."

def calculator(expression: str) -> str:
    """Evaluate a math expression."""
    return str(eval(expression))

agent = dspy.ReAct(
    "question -> answer",
    tools=[search_web, calculator],
    max_iters=5
)

result = agent(question="What is the population of France times 2?")
print(result.answer)

Assertions and Suggestions

import dspy

class AnswerWithCitations(dspy.Module):
    def __init__(self):
        self.generate = dspy.ChainOfThought("context, question -> answer, citations")

    def forward(self, context, question):
        pred = self.generate(context=context, question=question)

        # Hard constraint — raises AssertionError if violated, triggers retry
        dspy.Assert(
            len(pred.citations) > 0,
            "Answer must include at least one citation"
        )

        # Soft constraint — logs warning, program continues
        dspy.Suggest(
            len(pred.answer.split()) < 100,
            "Keep the answer concise (under 100 words)"
        )

        return pred

Optimizers

BootstrapFewShot

import dspy
from dspy.teleprompt import BootstrapFewShot

# Define a metric function
def exact_match(example, prediction, trace=None):
    return example.answer.lower() == prediction.answer.lower()

def f1_score_metric(example, prediction, trace=None):
    gold = set(example.answer.lower().split())
    pred = set(prediction.answer.lower().split())
    if not pred:
        return 0.0
    precision = len(gold & pred) / len(pred)
    recall = len(gold & pred) / len(gold)
    if precision + recall == 0:
        return 0.0
    return 2 * precision * recall / (precision + recall)

# Prepare training data
trainset = [
    dspy.Example(question="What is the capital of France?", answer="Paris").with_inputs("question"),
    dspy.Example(question="Who wrote Hamlet?", answer="Shakespeare").with_inputs("question"),
    # ... more examples
]

# Compile with BootstrapFewShot
optimizer = BootstrapFewShot(
    metric=exact_match,
    max_bootstrapped_demos=4,   # max few-shot examples per module
    max_labeled_demos=16,       # max labeled examples to use
)

compiled_program = optimizer.compile(
    student=RAGPipeline(retriever=my_retriever),
    trainset=trainset,
)

# Save compiled program
compiled_program.save("compiled_rag.json")

MIPRO (Advanced Optimizer)

from dspy.teleprompt import MIPROv2

optimizer = MIPROv2(
    metric=f1_score_metric,
    auto="medium",          # "light" | "medium" | "heavy"
    num_threads=8,
)

compiled = optimizer.compile(
    student=my_module,
    trainset=trainset,
    valset=valset,          # optional validation set
    num_batches=20,
    max_bootstrapped_demos=3,
    max_labeled_demos=5,
    requires_permission_to_run=False,
)

BootstrapFewShotWithRandomSearch

from dspy.teleprompt import BootstrapFewShotWithRandomSearch

optimizer = BootstrapFewShotWithRandomSearch(
    metric=exact_match,
    max_bootstrapped_demos=4,
    num_candidate_programs=10,  # programs to try
    num_threads=8,
)
compiled = optimizer.compile(student=my_module, trainset=trainset)

Evaluation

import dspy
from dspy.evaluate import Evaluate

devset = [
    dspy.Example(question="...", answer="...").with_inputs("question"),
    # ...
]

# Run evaluation
evaluate = Evaluate(
    devset=devset,
    metric=exact_match,
    num_threads=8,
    display_progress=True,
    display_table=5,        # show first 5 results
)

score = evaluate(compiled_program)
print(f"Score: {score:.1f}%")

# Compare two programs
baseline_score = evaluate(baseline_program)
optimized_score = evaluate(optimized_program)
print(f"Improvement: {optimized_score - baseline_score:.1f}%")

Common Workflows

Workflow 1: Build and Evaluate a QA System

import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate

# 1. Configure LM
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# 2. Define program
class SimpleQA(dspy.Module):
    def __init__(self):
        self.cot = dspy.ChainOfThought("question -> answer")

    def forward(self, question):
        return self.cot(question=question)

# 3. Prepare data
trainset = [dspy.Example(q="...", a="...").with_inputs("q") for ...]
devset = [...]

# 4. Optimize
optimizer = BootstrapFewShot(metric=lambda ex, pred, trace=None:
    ex.a.lower() in pred.answer.lower())
compiled = optimizer.compile(SimpleQA(), trainset=trainset)

# 5. Evaluate
evaluate = Evaluate(devset=devset, metric=..., num_threads=4)
print(evaluate(compiled))

# 6. Save
compiled.save("my_qa_program.json")

Workflow 2: Inspect Compiled Prompts

# See what prompts the optimizer generated
compiled_program.inspect_history(n=3)

# View the LM history
dspy.inspect_history(n=5)

# Print the compiled signature instructions
for name, module in compiled_program.named_predictors():
    print(f"\n--- {name} ---")
    print(module.extended_signature.instructions)
    for demo in module.demos:
        print("  Demo:", demo)

Tips and Best Practices

  • Start with dspy.Predict, upgrade to dspy.ChainOfThought once baseline is working — CoT adds latency and tokens.
  • Your metric is everything. DSPy optimizers hill-climb on the metric; a fuzzy or incorrect metric produces bad compiled programs.
  • Use .with_inputs() on every dspy.Example to tell DSPy which fields are inputs vs. labels.
  • BootstrapFewShot is the fastest optimizer. Use MIPROv2 when you need instruction optimization too.
  • Save compiled programs with .save() and load in production — no optimizer needed at inference time.
  • Assertions add retry loops; use dspy.Suggest for soft guidance to avoid infinite retries.
  • Thread count (num_threads) in evaluation/optimization maps directly to API concurrency — match your rate limits.
  • Caching is on by default; clear with dspy.settings.configure(cache_turn_on=False) when debugging.
  • Modular design — break complex tasks into small DSPy modules and compose them; optimizers tune each independently.
  • Version your trainsets alongside compiled JSON programs for reproducibility.