DSPy: Programmatic Prompt Optimization's optimizers (formerly called "teleprompters") are the framework's most distinctive feature. Instead of manually tuning prompts, you provide a training set and a metric, and the optimizer automatically discovers the best prompt instructions, few-shot examples, and module configurations. This section covers the three most important optimizers: BootstrapFewShot for example selection, MIPRO for joint instruction and example optimization, and BayesianSignatureOptimizer for Bayesian prompt search.
1. The Optimization Loop
Every DSPy: Programmatic Prompt Optimization optimizer follows the same high-level pattern: take a module, a training set, and a metric function, then search for the configuration that maximizes the metric. The optimizer modifies the module's internal prompts (not your Python code) to improve performance.
import dspy
# Step 1: Define your module
class QA(dspy.Module):
def __init__(self):
super().__init__()
self.answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, context, question):
return self.answer(context=context, question=question)
# Step 2: Prepare training data
trainset = [
dspy.Example(
context="Python was created by Guido van Rossum in 1991.",
question="Who created Python?",
answer="Guido van Rossum",
).with_inputs("context", "question"),
dspy.Example(
context="The Eiffel Tower is 330 meters tall.",
question="How tall is the Eiffel Tower?",
answer="330 meters",
).with_inputs("context", "question"),
# ... more examples
]
# Step 3: Define a metric
def exact_match(example, prediction, trace=None):
return example.answer.lower() == prediction.answer.lower()
The .with_inputs() call tells DSPy: Programmatic Prompt Optimization which fields are inputs and which are labels. Fields not marked as inputs (like answer) are treated as ground truth for the metric function.
DSPy: Programmatic Prompt Optimization optimizers modify the prompts inside your modules, not the module code itself. After optimization, your forward method is unchanged, but the internal prompt templates contain optimized instructions and curated few-shot examples. This is why DSPy: Programmatic Prompt Optimization calls it "compiling" a module.
2. BootstrapFewShot
BootstrapFewShot is the simplest and most commonly used optimizer. It runs your module on the training set, collects successful examples (those that pass the metric), and uses them as few-shot demonstrations in the prompt.
from dspy.teleprompt import BootstrapFewShot
# Create the optimizer
optimizer = BootstrapFewShot(
metric=exact_match,
max_bootstrapped_demos=4, # Max few-shot examples per module
max_labeled_demos=4, # Max labeled examples to use
max_rounds=1, # Number of bootstrapping iterations
)
# Compile (optimize) the module
qa = QA()
compiled_qa = optimizer.compile(
student=qa,
trainset=trainset,
)
# The compiled module now includes few-shot examples in its prompts
result = compiled_qa(
context="Mount Everest is 8,849 meters above sea level.",
question="How high is Mount Everest?",
)
print(result.answer) # "8,849 meters"
The "bootstrapping" process works as follows. The optimizer runs the unoptimized module on each training example. When the module produces a correct answer (according to the metric), that input/output pair is saved as a demonstration. These demonstrations are then included in the prompt as few-shot examples for future calls.
3. BootstrapFewShotWithRandomSearch
This variant extends BootstrapFewShot by trying multiple random subsets of demonstrations and keeping the best-performing combination.
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
optimizer = BootstrapFewShotWithRandomSearch(
metric=exact_match,
max_bootstrapped_demos=4,
max_labeled_demos=4,
num_candidate_programs=10, # Try 10 random demo subsets
num_threads=4, # Parallelize evaluation
)
compiled_qa = optimizer.compile(
student=QA(),
trainset=trainset,
valset=valset, # Optional validation set for selection
)
Random search adds cost (it evaluates multiple candidate programs) but typically finds better demonstrations than the basic version. Use it when you have a validation set for reliable selection.
Start with BootstrapFewShot for rapid iteration. Once you have a working pipeline, switch to BootstrapFewShotWithRandomSearch for a few percentage points of additional accuracy. The extra cost is in optimization time (a one-time expense), not inference time.
4. MIPRO: Multi-prompt Instruction Proposal Optimizer
MIPRO optimizes both the instructions and the few-shot examples jointly. While BootstrapFewShot only selects examples, MIPRO also rewrites the task instructions in the prompt to improve performance.
from dspy.teleprompt import MIPRO
optimizer = MIPRO(
metric=exact_match,
num_candidates=10, # Number of instruction candidates
init_temperature=1.0, # Exploration temperature
prompt_model=dspy.LM("openai/gpt-4o"), # Model for generating instructions
task_model=dspy.LM("openai/gpt-4o-mini"), # Model for executing tasks
)
compiled_qa = optimizer.compile(
student=QA(),
trainset=trainset,
num_trials=20, # Total optimization trials
max_bootstrapped_demos=3,
max_labeled_demos=3,
)
MIPRO uses a two-model setup. The prompt_model (usually a stronger model like GPT-4o) generates candidate instructions. The task_model (which can be a smaller, cheaper model) evaluates those candidates on the training set. This lets you optimize prompts for a cheap model using the intelligence of an expensive one.
5. BayesianSignatureOptimizer
The Bayesian optimizer treats prompt optimization as a Bayesian optimization problem. It models the relationship between prompt variations and performance, then uses this model to efficiently explore the prompt space.
from dspy.teleprompt import BayesianSignatureOptimizer
optimizer = BayesianSignatureOptimizer(
metric=exact_match,
n=20, # Number of optimization iterations
verbose=True,
)
compiled_qa = optimizer.compile(
student=QA(),
devset=trainset, # Uses "devset" instead of "trainset"
num_threads=4,
)
# The optimizer reports its progress:
# Trial 1/20: score=0.65
# Trial 2/20: score=0.70
# ...
# Trial 20/20: score=0.88
# Best score: 0.88
The Bayesian approach is sample-efficient: it typically finds good configurations in fewer trials than random search. However, it is sequential by nature (each trial informs the next), so it can be slower in wall-clock time despite fewer total evaluations.
All optimizers consume LLM tokens during the optimization process. A MIPRO run with 20 trials, 10 candidates, and 50 training examples might make thousands of LLM calls. Monitor your token usage during optimization. Consider using a cheaper model for the task_model during optimization, then deploying the optimized prompts with a production model.
6. Saving and Loading Compiled Modules
After optimization, you should save the compiled module so you do not need to re-optimize every time your application starts.
# Save the compiled module
compiled_qa.save("optimized_qa.json")
# Load it later
loaded_qa = QA()
loaded_qa.load("optimized_qa.json")
# The loaded module has the same optimized prompts
result = loaded_qa(
context="The Great Wall of China is over 13,000 miles long.",
question="How long is the Great Wall?",
)
print(result.answer)
The saved file contains the optimized instructions and few-shot examples for each sub-module. Your Python code (the module class definition) must still be available at load time; only the prompt parameters are serialized.
7. Optimization Strategies for Production
Choosing the right optimizer and configuration for production requires balancing quality, cost, and iteration speed. Here is a decision framework.
- Rapid prototyping: Use
BootstrapFewShotwith a small training set (10 to 20 examples). Quick, cheap, and usually sufficient for a first pass. - Quality improvement: Switch to
BootstrapFewShotWithRandomSearchwith a larger training set (50+ examples) and a validation set. The random search finds better example combinations. - Maximum performance: Use
MIPROwith a strong prompt model. The joint optimization of instructions and examples typically yields the best results. - Constrained budgets: Use
BayesianSignatureOptimizerwhen you need good results with minimal token expenditure. Its sample efficiency is ideal for expensive models.
# A production optimization workflow
import dspy
from dspy.teleprompt import MIPRO
# 1. Load training and validation data
trainset = load_examples("train.jsonl")
valset = load_examples("val.jsonl")
# 2. Optimize with MIPRO
optimizer = MIPRO(metric=my_metric, num_candidates=15)
compiled = optimizer.compile(
student=MyPipeline(),
trainset=trainset,
num_trials=30,
)
# 3. Evaluate on validation set
from dspy.evaluate import Evaluate
evaluator = Evaluate(devset=valset, metric=my_metric, num_threads=8)
score = evaluator(compiled)
print(f"Validation accuracy: {score:.1%}")
# 4. Save if quality meets threshold
if score > 0.85:
compiled.save("production_pipeline.json")
print("Deployed!")
Treat DSPy: Programmatic Prompt Optimization optimization like model training. Keep your training and validation sets separate. Version your compiled modules alongside your code. Re-optimize when you add new training data, change the underlying LLM, or modify your module architecture.