Building Conversational AI with LLMs and Agents
Appendix Q: DSPy: Programmatic Prompt Optimization

Optimizers: BootstrapFewShot, MIPRO, and BayesianSignatureOptimizer

Big Picture

DSPy: Programmatic Prompt Optimization's optimizers (formerly called "teleprompters") are the framework's most distinctive feature. Instead of manually tuning prompts, you provide a training set and a metric, and the optimizer automatically discovers the best prompt instructions, few-shot examples, and module configurations. This section covers the three most important optimizers: BootstrapFewShot for example selection, MIPRO for joint instruction and example optimization, and BayesianSignatureOptimizer for Bayesian prompt search.

1. The Optimization Loop

Every DSPy: Programmatic Prompt Optimization optimizer follows the same high-level pattern: take a module, a training set, and a metric function, then search for the configuration that maximizes the metric. The optimizer modifies the module's internal prompts (not your Python code) to improve performance.

import dspy

# Step 1: Define your module
class QA(dspy.Module):
    def __init__(self):
        super().__init__()
        self.answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, context, question):
        return self.answer(context=context, question=question)

# Step 2: Prepare training data
trainset = [
    dspy.Example(
        context="Python was created by Guido van Rossum in 1991.",
        question="Who created Python?",
        answer="Guido van Rossum",
    ).with_inputs("context", "question"),
    dspy.Example(
        context="The Eiffel Tower is 330 meters tall.",
        question="How tall is the Eiffel Tower?",
        answer="330 meters",
    ).with_inputs("context", "question"),
    # ... more examples
]

# Step 3: Define a metric
def exact_match(example, prediction, trace=None):
    return example.answer.lower() == prediction.answer.lower()

The .with_inputs() call tells DSPy: Programmatic Prompt Optimization which fields are inputs and which are labels. Fields not marked as inputs (like answer) are treated as ground truth for the metric function.

Key Insight

DSPy: Programmatic Prompt Optimization optimizers modify the prompts inside your modules, not the module code itself. After optimization, your forward method is unchanged, but the internal prompt templates contain optimized instructions and curated few-shot examples. This is why DSPy: Programmatic Prompt Optimization calls it "compiling" a module.

2. BootstrapFewShot

BootstrapFewShot is the simplest and most commonly used optimizer. It runs your module on the training set, collects successful examples (those that pass the metric), and uses them as few-shot demonstrations in the prompt.

from dspy.teleprompt import BootstrapFewShot

# Create the optimizer
optimizer = BootstrapFewShot(
    metric=exact_match,
    max_bootstrapped_demos=4,  # Max few-shot examples per module
    max_labeled_demos=4,       # Max labeled examples to use
    max_rounds=1,              # Number of bootstrapping iterations
)

# Compile (optimize) the module
qa = QA()
compiled_qa = optimizer.compile(
    student=qa,
    trainset=trainset,
)

# The compiled module now includes few-shot examples in its prompts
result = compiled_qa(
    context="Mount Everest is 8,849 meters above sea level.",
    question="How high is Mount Everest?",
)
print(result.answer)  # "8,849 meters"
Retriever configured: ColBERTv2 Index: wikipedia-2023 Retrieve(k=5): 5 passages retrieved Top passage (score=0.91): Albert Einstein was a German-born...

The "bootstrapping" process works as follows. The optimizer runs the unoptimized module on each training example. When the module produces a correct answer (according to the metric), that input/output pair is saved as a demonstration. These demonstrations are then included in the prompt as few-shot examples for future calls.

3. BootstrapFewShotWithRandomSearch

This variant extends BootstrapFewShot by trying multiple random subsets of demonstrations and keeping the best-performing combination.

from dspy.teleprompt import BootstrapFewShotWithRandomSearch

optimizer = BootstrapFewShotWithRandomSearch(
    metric=exact_match,
    max_bootstrapped_demos=4,
    max_labeled_demos=4,
    num_candidate_programs=10,  # Try 10 random demo subsets
    num_threads=4,              # Parallelize evaluation
)

compiled_qa = optimizer.compile(
    student=QA(),
    trainset=trainset,
    valset=valset,  # Optional validation set for selection
)

Random search adds cost (it evaluates multiple candidate programs) but typically finds better demonstrations than the basic version. Use it when you have a validation set for reliable selection.

Tip

Start with BootstrapFewShot for rapid iteration. Once you have a working pipeline, switch to BootstrapFewShotWithRandomSearch for a few percentage points of additional accuracy. The extra cost is in optimization time (a one-time expense), not inference time.

4. MIPRO: Multi-prompt Instruction Proposal Optimizer

MIPRO optimizes both the instructions and the few-shot examples jointly. While BootstrapFewShot only selects examples, MIPRO also rewrites the task instructions in the prompt to improve performance.

from dspy.teleprompt import MIPRO

optimizer = MIPRO(
    metric=exact_match,
    num_candidates=10,         # Number of instruction candidates
    init_temperature=1.0,      # Exploration temperature
    prompt_model=dspy.LM("openai/gpt-4o"),  # Model for generating instructions
    task_model=dspy.LM("openai/gpt-4o-mini"),  # Model for executing tasks
)

compiled_qa = optimizer.compile(
    student=QA(),
    trainset=trainset,
    num_trials=20,             # Total optimization trials
    max_bootstrapped_demos=3,
    max_labeled_demos=3,
)

MIPRO uses a two-model setup. The prompt_model (usually a stronger model like GPT-4o) generates candidate instructions. The task_model (which can be a smaller, cheaper model) evaluates those candidates on the training set. This lets you optimize prompts for a cheap model using the intelligence of an expensive one.

5. BayesianSignatureOptimizer

The Bayesian optimizer treats prompt optimization as a Bayesian optimization problem. It models the relationship between prompt variations and performance, then uses this model to efficiently explore the prompt space.

from dspy.teleprompt import BayesianSignatureOptimizer

optimizer = BayesianSignatureOptimizer(
    metric=exact_match,
    n=20,                      # Number of optimization iterations
    verbose=True,
)

compiled_qa = optimizer.compile(
    student=QA(),
    devset=trainset,           # Uses "devset" instead of "trainset"
    num_threads=4,
)

# The optimizer reports its progress:
# Trial 1/20: score=0.65
# Trial 2/20: score=0.70
# ...
# Trial 20/20: score=0.88
# Best score: 0.88

The Bayesian approach is sample-efficient: it typically finds good configurations in fewer trials than random search. However, it is sequential by nature (each trial informs the next), so it can be slower in wall-clock time despite fewer total evaluations.

Warning

All optimizers consume LLM tokens during the optimization process. A MIPRO run with 20 trials, 10 candidates, and 50 training examples might make thousands of LLM calls. Monitor your token usage during optimization. Consider using a cheaper model for the task_model during optimization, then deploying the optimized prompts with a production model.

6. Saving and Loading Compiled Modules

After optimization, you should save the compiled module so you do not need to re-optimize every time your application starts.

# Save the compiled module
compiled_qa.save("optimized_qa.json")

# Load it later
loaded_qa = QA()
loaded_qa.load("optimized_qa.json")

# The loaded module has the same optimized prompts
result = loaded_qa(
    context="The Great Wall of China is over 13,000 miles long.",
    question="How long is the Great Wall?",
)
print(result.answer)
RAG pipeline output: Question: Who developed the theory of relativity? Retrieved: 5 passages Answer: Albert Einstein developed both special relativity (1905) and general relativity (1915). Sources: Wikipedia, Physics textbook

The saved file contains the optimized instructions and few-shot examples for each sub-module. Your Python code (the module class definition) must still be available at load time; only the prompt parameters are serialized.

7. Optimization Strategies for Production

Choosing the right optimizer and configuration for production requires balancing quality, cost, and iteration speed. Here is a decision framework.

# A production optimization workflow
import dspy
from dspy.teleprompt import MIPRO

# 1. Load training and validation data
trainset = load_examples("train.jsonl")
valset = load_examples("val.jsonl")

# 2. Optimize with MIPRO
optimizer = MIPRO(metric=my_metric, num_candidates=15)
compiled = optimizer.compile(
    student=MyPipeline(),
    trainset=trainset,
    num_trials=30,
)

# 3. Evaluate on validation set
from dspy.evaluate import Evaluate
evaluator = Evaluate(devset=valset, metric=my_metric, num_threads=8)
score = evaluator(compiled)
print(f"Validation accuracy: {score:.1%}")

# 4. Save if quality meets threshold
if score > 0.85:
    compiled.save("production_pipeline.json")
    print("Deployed!")
Multi-hop reasoning: Hop 1: Who wrote 'Hamlet'? -> William Shakespeare Hop 2: Where was Shakespeare born? -> Stratford-upon-Avon Final answer: Stratford-upon-Avon, England
Tip

Treat DSPy: Programmatic Prompt Optimization optimization like model training. Keep your training and validation sets separate. Version your compiled modules alongside your code. Re-optimize when you add new training data, change the underlying LLM, or modify your module architecture.