DSPy Cheat Sheet

Overview

DSPy (Declarative Self-improving Python) is a framework from Stanford that replaces hand-crafted prompts with composable Python modules. Instead of writing brittle prompt strings, you define signatures (input/output specs) and modules (reasoning strategies), then let optimizers automatically find the best prompts and few-shot examples for your task and model.

Key concepts: Signatures declare what a module does. Modules implement reasoning patterns (ChainOfThought, ReAct, etc.). Optimizers (formerly called teleprompters) tune the prompts/weights using a training set and metric.

Installation

# Core install
pip install dspy

# With optional extras
pip install dspy[all]          # All integrations
pip install dspy chromadb      # With vector store support
pip install dspy anthropic     # Anthropic models
pip install dspy google-generativeai  # Gemini models

# Development / latest
pip install git+https://github.com/stanfordnlp/dspy.git

Configuration

import dspy

# --- OpenAI ---
lm = dspy.LM("openai/gpt-4o-mini", api_key="sk-...")
dspy.configure(lm=lm)

# --- Anthropic ---
lm = dspy.LM("anthropic/claude-3-5-haiku-20241022", api_key="sk-ant-...")
dspy.configure(lm=lm)

# --- Local / Ollama ---
lm = dspy.LM("ollama/llama3.2", api_base="http://localhost:11434")
dspy.configure(lm=lm)

# --- Together AI ---
lm = dspy.LM("together_ai/meta-llama/Llama-3-8b-chat-hf", api_key="...")
dspy.configure(lm=lm)

# --- Multiple LMs in one program ---
fast_lm = dspy.LM("openai/gpt-4o-mini")
powerful_lm = dspy.LM("openai/gpt-4o")
dspy.configure(lm=fast_lm)   # default

# Per-module override
module = dspy.ChainOfThought("question -> answer")
module.set_lm(powerful_lm)

# Caching (enabled by default, stored in ~/.dspy_cache)
dspy.configure(lm=lm, experimental=True)   # opt into new features

Core Concepts

Signatures

Signatures define the input/output contract of a module.

# Inline string signature
qa = dspy.Predict("question -> answer")

# Class-based signature with docs and field hints
class SentimentClassifier(dspy.Signature):
    """Classify the sentiment of a product review."""
    review: str = dspy.InputField(desc="The customer review text")
    sentiment: str = dspy.OutputField(desc="positive, negative, or neutral")
    confidence: float = dspy.OutputField(desc="Confidence score 0.0-1.0")

clf = dspy.Predict(SentimentClassifier)
result = clf(review="This product changed my life!")
print(result.sentiment, result.confidence)

# Multi-hop signatures
class HopSignature(dspy.Signature):
    """Answer using retrieved context."""
    context: list[str] = dspy.InputField(desc="Retrieved passages")
    question: str = dspy.InputField()
    answer: str = dspy.OutputField(desc="Concise factual answer")

Built-in Modules

Module	Description
`dspy.Predict`	Direct prediction, no chain-of-thought
`dspy.ChainOfThought`	Adds reasoning field before answer
`dspy.ChainOfThoughtWithHint`	CoT with an optional hint input
`dspy.ProgramOfThought`	Generates and executes Python code
`dspy.ReAct`	Reason + Act with tool calls
`dspy.MultiChainComparison`	Generates N chains, picks best
`dspy.BestOfN`	Runs N completions, returns best
`dspy.Retry`	Retries with feedback on failure
`dspy.Assert` / `dspy.Suggest`	Enforce output constraints

Core API Reference

API	Description
`dspy.LM(model, **kwargs)`	Create a language model client
`dspy.configure(lm=...)`	Set global defaults
`dspy.Signature`	Base class for signatures
`dspy.InputField(desc=...)`	Declare an input field
`dspy.OutputField(desc=...)`	Declare an output field
`dspy.Predict(sig)`	Simplest prediction module
`dspy.ChainOfThought(sig)`	CoT reasoning module
`dspy.ReAct(sig, tools=[...])`	Tool-use agent module
`dspy.Module`	Base class for custom modules
`dspy.Example(**kwargs)`	Training/test data point
`dspy.Evaluate(devset, metric)`	Run evaluation
`dspy.teleprompt.*`	Optimizers namespace
`program.save("path.json")`	Save compiled program
`program.load("path.json")`	Load compiled program

Advanced Usage

Custom Modules

import dspy

class RAGPipeline(dspy.Module):
    def __init__(self, retriever, num_passages=3):
        self.retrieve = retriever
        self.num_passages = num_passages
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        passages = self.retrieve(question, k=self.num_passages)
        context = [p.long_text for p in passages]
        prediction = self.generate(context=context, question=question)
        return dspy.Prediction(answer=prediction.answer, passages=passages)

# Multi-hop reasoning pipeline
class MultiHopQA(dspy.Module):
    def __init__(self, retriever, hops=2):
        self.retrieve = retriever
        self.generate_query = [dspy.ChainOfThought("context, question -> search_query")
                               for _ in range(hops)]
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = []
        for hop in self.generate_query:
            query = hop(context=context, question=question).search_query
            passages = self.retrieve(query, k=3)
            context += [p.long_text for p in passages]
        return self.generate_answer(context=context, question=question)

ReAct Agent with Tools

import dspy

def search_web(query: str) -> str:
    """Search the web and return top results."""
    # your search implementation
    return "search results..."

def calculator(expression: str) -> str:
    """Evaluate a math expression."""
    return str(eval(expression))

agent = dspy.ReAct(
    "question -> answer",
    tools=[search_web, calculator],
    max_iters=5
)

result = agent(question="What is the population of France times 2?")
print(result.answer)

Assertions and Suggestions

import dspy

class AnswerWithCitations(dspy.Module):
    def __init__(self):
        self.generate = dspy.ChainOfThought("context, question -> answer, citations")

    def forward(self, context, question):
        pred = self.generate(context=context, question=question)

        # Hard constraint — raises AssertionError if violated, triggers retry
        dspy.Assert(
            len(pred.citations) > 0,
            "Answer must include at least one citation"
        )

        # Soft constraint — logs warning, program continues
        dspy.Suggest(
            len(pred.answer.split()) < 100,
            "Keep the answer concise (under 100 words)"
        )

        return pred

Optimizers

BootstrapFewShot

import dspy
from dspy.teleprompt import BootstrapFewShot

# Define a metric function
def exact_match(example, prediction, trace=None):
    return example.answer.lower() == prediction.answer.lower()

def f1_score_metric(example, prediction, trace=None):
    gold = set(example.answer.lower().split())
    pred = set(prediction.answer.lower().split())
    if not pred:
        return 0.0
    precision = len(gold & pred) / len(pred)
    recall = len(gold & pred) / len(gold)
    if precision + recall == 0:
        return 0.0
    return 2 * precision * recall / (precision + recall)

# Prepare training data
trainset = [
    dspy.Example(question="What is the capital of France?", answer="Paris").with_inputs("question"),
    dspy.Example(question="Who wrote Hamlet?", answer="Shakespeare").with_inputs("question"),
    # ... more examples
]

# Compile with BootstrapFewShot
optimizer = BootstrapFewShot(
    metric=exact_match,
    max_bootstrapped_demos=4,   # max few-shot examples per module
    max_labeled_demos=16,       # max labeled examples to use
)

compiled_program = optimizer.compile(
    student=RAGPipeline(retriever=my_retriever),
    trainset=trainset,
)

# Save compiled program
compiled_program.save("compiled_rag.json")

MIPRO (Advanced Optimizer)

from dspy.teleprompt import MIPROv2

optimizer = MIPROv2(
    metric=f1_score_metric,
    auto="medium",          # "light" | "medium" | "heavy"
    num_threads=8,
)

compiled = optimizer.compile(
    student=my_module,
    trainset=trainset,
    valset=valset,          # optional validation set
    num_batches=20,
    max_bootstrapped_demos=3,
    max_labeled_demos=5,
    requires_permission_to_run=False,
)

BootstrapFewShotWithRandomSearch

from dspy.teleprompt import BootstrapFewShotWithRandomSearch

optimizer = BootstrapFewShotWithRandomSearch(
    metric=exact_match,
    max_bootstrapped_demos=4,
    num_candidate_programs=10,  # programs to try
    num_threads=8,
)
compiled = optimizer.compile(student=my_module, trainset=trainset)

Evaluation

import dspy
from dspy.evaluate import Evaluate

devset = [
    dspy.Example(question="...", answer="...").with_inputs("question"),
    # ...
]

# Run evaluation
evaluate = Evaluate(
    devset=devset,
    metric=exact_match,
    num_threads=8,
    display_progress=True,
    display_table=5,        # show first 5 results
)

score = evaluate(compiled_program)
print(f"Score: {score:.1f}%")

# Compare two programs
baseline_score = evaluate(baseline_program)
optimized_score = evaluate(optimized_program)
print(f"Improvement: {optimized_score - baseline_score:.1f}%")

Common Workflows

Workflow 1: Build and Evaluate a QA System

import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate

# 1. Configure LM
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# 2. Define program
class SimpleQA(dspy.Module):
    def __init__(self):
        self.cot = dspy.ChainOfThought("question -> answer")

    def forward(self, question):
        return self.cot(question=question)

# 3. Prepare data
trainset = [dspy.Example(q="...", a="...").with_inputs("q") for ...]
devset = [...]

# 4. Optimize
optimizer = BootstrapFewShot(metric=lambda ex, pred, trace=None:
    ex.a.lower() in pred.answer.lower())
compiled = optimizer.compile(SimpleQA(), trainset=trainset)

# 5. Evaluate
evaluate = Evaluate(devset=devset, metric=..., num_threads=4)
print(evaluate(compiled))

# 6. Save
compiled.save("my_qa_program.json")

Workflow 2: Inspect Compiled Prompts

# See what prompts the optimizer generated
compiled_program.inspect_history(n=3)

# View the LM history
dspy.inspect_history(n=5)

# Print the compiled signature instructions
for name, module in compiled_program.named_predictors():
    print(f"\n--- {name} ---")
    print(module.extended_signature.instructions)
    for demo in module.demos:
        print("  Demo:", demo)

Tips and Best Practices

Start with dspy.Predict, upgrade to dspy.ChainOfThought once baseline is working — CoT adds latency and tokens.
Your metric is everything. DSPy optimizers hill-climb on the metric; a fuzzy or incorrect metric produces bad compiled programs.
Use .with_inputs() on every dspy.Example to tell DSPy which fields are inputs vs. labels.
BootstrapFewShot is the fastest optimizer. Use MIPROv2 when you need instruction optimization too.
Save compiled programs with .save() and load in production — no optimizer needed at inference time.
Assertions add retry loops; use dspy.Suggest for soft guidance to avoid infinite retries.
Thread count (num_threads) in evaluation/optimization maps directly to API concurrency — match your rate limits.
Caching is on by default; clear with dspy.settings.configure(cache_turn_on=False) when debugging.
Modular design — break complex tasks into small DSPy modules and compose them; optimizers tune each independently.
Version your trainsets alongside compiled JSON programs for reproducibility.