DSPy Cheat Sheet
Overview
DSPy (Declarative Self-improving Python) is a framework from Stanford that replaces hand-crafted prompts with composable Python modules. Instead of writing brittle prompt strings, you define signatures (input/output specs) and modules (reasoning strategies), then let optimizers automatically find the best prompts and few-shot examples for your task and model.
Key concepts: Signatures declare what a module does. Modules implement reasoning patterns (ChainOfThought, ReAct, etc.). Optimizers (formerly called teleprompters) tune the prompts/weights using a training set and metric.
Installation
# Core install
pip install dspy
# With optional extras
pip install dspy[all] # All integrations
pip install dspy chromadb # With vector store support
pip install dspy anthropic # Anthropic models
pip install dspy google-generativeai # Gemini models
# Development / latest
pip install git+https://github.com/stanfordnlp/dspy.git
Configuration
import dspy
# --- OpenAI ---
lm = dspy.LM("openai/gpt-4o-mini", api_key="sk-...")
dspy.configure(lm=lm)
# --- Anthropic ---
lm = dspy.LM("anthropic/claude-3-5-haiku-20241022", api_key="sk-ant-...")
dspy.configure(lm=lm)
# --- Local / Ollama ---
lm = dspy.LM("ollama/llama3.2", api_base="http://localhost:11434")
dspy.configure(lm=lm)
# --- Together AI ---
lm = dspy.LM("together_ai/meta-llama/Llama-3-8b-chat-hf", api_key="...")
dspy.configure(lm=lm)
# --- Multiple LMs in one program ---
fast_lm = dspy.LM("openai/gpt-4o-mini")
powerful_lm = dspy.LM("openai/gpt-4o")
dspy.configure(lm=fast_lm) # default
# Per-module override
module = dspy.ChainOfThought("question -> answer")
module.set_lm(powerful_lm)
# Caching (enabled by default, stored in ~/.dspy_cache)
dspy.configure(lm=lm, experimental=True) # opt into new features
Core Concepts
Signatures
Signatures define the input/output contract of a module.
# Inline string signature
qa = dspy.Predict("question -> answer")
# Class-based signature with docs and field hints
class SentimentClassifier(dspy.Signature):
"""Classify the sentiment of a product review."""
review: str = dspy.InputField(desc="The customer review text")
sentiment: str = dspy.OutputField(desc="positive, negative, or neutral")
confidence: float = dspy.OutputField(desc="Confidence score 0.0-1.0")
clf = dspy.Predict(SentimentClassifier)
result = clf(review="This product changed my life!")
print(result.sentiment, result.confidence)
# Multi-hop signatures
class HopSignature(dspy.Signature):
"""Answer using retrieved context."""
context: list[str] = dspy.InputField(desc="Retrieved passages")
question: str = dspy.InputField()
answer: str = dspy.OutputField(desc="Concise factual answer")
Built-in Modules
| Module | Description |
|---|---|
dspy.Predict | Direct prediction, no chain-of-thought |
dspy.ChainOfThought | Adds reasoning field before answer |
dspy.ChainOfThoughtWithHint | CoT with an optional hint input |
dspy.ProgramOfThought | Generates and executes Python code |
dspy.ReAct | Reason + Act with tool calls |
dspy.MultiChainComparison | Generates N chains, picks best |
dspy.BestOfN | Runs N completions, returns best |
dspy.Retry | Retries with feedback on failure |
dspy.Assert / dspy.Suggest | Enforce output constraints |
Core API Reference
| API | Description |
|---|---|
dspy.LM(model, **kwargs) | Create a language model client |
dspy.configure(lm=...) | Set global defaults |
dspy.Signature | Base class for signatures |
dspy.InputField(desc=...) | Declare an input field |
dspy.OutputField(desc=...) | Declare an output field |
dspy.Predict(sig) | Simplest prediction module |
dspy.ChainOfThought(sig) | CoT reasoning module |
dspy.ReAct(sig, tools=[...]) | Tool-use agent module |
dspy.Module | Base class for custom modules |
dspy.Example(**kwargs) | Training/test data point |
dspy.Evaluate(devset, metric) | Run evaluation |
dspy.teleprompt.* | Optimizers namespace |
program.save("path.json") | Save compiled program |
program.load("path.json") | Load compiled program |
Advanced Usage
Custom Modules
import dspy
class RAGPipeline(dspy.Module):
def __init__(self, retriever, num_passages=3):
self.retrieve = retriever
self.num_passages = num_passages
self.generate = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
passages = self.retrieve(question, k=self.num_passages)
context = [p.long_text for p in passages]
prediction = self.generate(context=context, question=question)
return dspy.Prediction(answer=prediction.answer, passages=passages)
# Multi-hop reasoning pipeline
class MultiHopQA(dspy.Module):
def __init__(self, retriever, hops=2):
self.retrieve = retriever
self.generate_query = [dspy.ChainOfThought("context, question -> search_query")
for _ in range(hops)]
self.generate_answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = []
for hop in self.generate_query:
query = hop(context=context, question=question).search_query
passages = self.retrieve(query, k=3)
context += [p.long_text for p in passages]
return self.generate_answer(context=context, question=question)
ReAct Agent with Tools
import dspy
def search_web(query: str) -> str:
"""Search the web and return top results."""
# your search implementation
return "search results..."
def calculator(expression: str) -> str:
"""Evaluate a math expression."""
return str(eval(expression))
agent = dspy.ReAct(
"question -> answer",
tools=[search_web, calculator],
max_iters=5
)
result = agent(question="What is the population of France times 2?")
print(result.answer)
Assertions and Suggestions
import dspy
class AnswerWithCitations(dspy.Module):
def __init__(self):
self.generate = dspy.ChainOfThought("context, question -> answer, citations")
def forward(self, context, question):
pred = self.generate(context=context, question=question)
# Hard constraint — raises AssertionError if violated, triggers retry
dspy.Assert(
len(pred.citations) > 0,
"Answer must include at least one citation"
)
# Soft constraint — logs warning, program continues
dspy.Suggest(
len(pred.answer.split()) < 100,
"Keep the answer concise (under 100 words)"
)
return pred
Optimizers
BootstrapFewShot
import dspy
from dspy.teleprompt import BootstrapFewShot
# Define a metric function
def exact_match(example, prediction, trace=None):
return example.answer.lower() == prediction.answer.lower()
def f1_score_metric(example, prediction, trace=None):
gold = set(example.answer.lower().split())
pred = set(prediction.answer.lower().split())
if not pred:
return 0.0
precision = len(gold & pred) / len(pred)
recall = len(gold & pred) / len(gold)
if precision + recall == 0:
return 0.0
return 2 * precision * recall / (precision + recall)
# Prepare training data
trainset = [
dspy.Example(question="What is the capital of France?", answer="Paris").with_inputs("question"),
dspy.Example(question="Who wrote Hamlet?", answer="Shakespeare").with_inputs("question"),
# ... more examples
]
# Compile with BootstrapFewShot
optimizer = BootstrapFewShot(
metric=exact_match,
max_bootstrapped_demos=4, # max few-shot examples per module
max_labeled_demos=16, # max labeled examples to use
)
compiled_program = optimizer.compile(
student=RAGPipeline(retriever=my_retriever),
trainset=trainset,
)
# Save compiled program
compiled_program.save("compiled_rag.json")
MIPRO (Advanced Optimizer)
from dspy.teleprompt import MIPROv2
optimizer = MIPROv2(
metric=f1_score_metric,
auto="medium", # "light" | "medium" | "heavy"
num_threads=8,
)
compiled = optimizer.compile(
student=my_module,
trainset=trainset,
valset=valset, # optional validation set
num_batches=20,
max_bootstrapped_demos=3,
max_labeled_demos=5,
requires_permission_to_run=False,
)
BootstrapFewShotWithRandomSearch
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
optimizer = BootstrapFewShotWithRandomSearch(
metric=exact_match,
max_bootstrapped_demos=4,
num_candidate_programs=10, # programs to try
num_threads=8,
)
compiled = optimizer.compile(student=my_module, trainset=trainset)
Evaluation
import dspy
from dspy.evaluate import Evaluate
devset = [
dspy.Example(question="...", answer="...").with_inputs("question"),
# ...
]
# Run evaluation
evaluate = Evaluate(
devset=devset,
metric=exact_match,
num_threads=8,
display_progress=True,
display_table=5, # show first 5 results
)
score = evaluate(compiled_program)
print(f"Score: {score:.1f}%")
# Compare two programs
baseline_score = evaluate(baseline_program)
optimized_score = evaluate(optimized_program)
print(f"Improvement: {optimized_score - baseline_score:.1f}%")
Common Workflows
Workflow 1: Build and Evaluate a QA System
import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate
# 1. Configure LM
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
# 2. Define program
class SimpleQA(dspy.Module):
def __init__(self):
self.cot = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.cot(question=question)
# 3. Prepare data
trainset = [dspy.Example(q="...", a="...").with_inputs("q") for ...]
devset = [...]
# 4. Optimize
optimizer = BootstrapFewShot(metric=lambda ex, pred, trace=None:
ex.a.lower() in pred.answer.lower())
compiled = optimizer.compile(SimpleQA(), trainset=trainset)
# 5. Evaluate
evaluate = Evaluate(devset=devset, metric=..., num_threads=4)
print(evaluate(compiled))
# 6. Save
compiled.save("my_qa_program.json")
Workflow 2: Inspect Compiled Prompts
# See what prompts the optimizer generated
compiled_program.inspect_history(n=3)
# View the LM history
dspy.inspect_history(n=5)
# Print the compiled signature instructions
for name, module in compiled_program.named_predictors():
print(f"\n--- {name} ---")
print(module.extended_signature.instructions)
for demo in module.demos:
print(" Demo:", demo)
Tips and Best Practices
- Start with
dspy.Predict, upgrade todspy.ChainOfThoughtonce baseline is working — CoT adds latency and tokens. - Your metric is everything. DSPy optimizers hill-climb on the metric; a fuzzy or incorrect metric produces bad compiled programs.
- Use
.with_inputs()on everydspy.Exampleto tell DSPy which fields are inputs vs. labels. - BootstrapFewShot is the fastest optimizer. Use MIPROv2 when you need instruction optimization too.
- Save compiled programs with
.save()and load in production — no optimizer needed at inference time. - Assertions add retry loops; use
dspy.Suggestfor soft guidance to avoid infinite retries. - Thread count (
num_threads) in evaluation/optimization maps directly to API concurrency — match your rate limits. - Caching is on by default; clear with
dspy.settings.configure(cache_turn_on=False)when debugging. - Modular design — break complex tasks into small DSPy modules and compose them; optimizers tune each independently.
- Version your trainsets alongside compiled JSON programs for reproducibility.