Part III: Working with LLMs

Chapter 12: Hybrid ML+LLM Architectures & Decision Frameworks

"The art of engineering is not choosing the most powerful tool, but choosing the right tool for each part of the problem."

Deploy Deploy, Tool-Savvy AI Agent
Hybrid ML and LLM Architectures chapter illustration
Figure 12.0.1: Not every problem needs a billion parameters. Sometimes the smartest architecture pairs a lightweight ML model with an LLM, each doing what it does best.

Chapter Overview

In production systems, LLMs rarely work in isolation. The most effective architectures combine large language models with classical machine learning, rules engines, and traditional software in carefully designed pipelines. The challenge is knowing when to use an LLM, when a simpler model will do, and how to orchestrate both into a system that maximizes quality while minimizing cost and latency. These decisions become central to strategic planning and ROI analysis for any AI initiative.

This chapter provides a principled decision framework for choosing between LLMs and classical ML. It covers patterns for using LLMs as feature extractors, building hybrid triage and escalation pipelines, optimizing total cost of ownership, and extracting structured information from unstructured text. These hybrid patterns complement the retrieval techniques covered in Chapter 20 on RAG and the end-to-end application architectures explored later in the book. Each pattern is grounded in real production scenarios with concrete benchmarks, code examples, and cost analyses.

By the end of this chapter, you will be able to evaluate any ML task against a rigorous decision matrix, design hybrid architectures that route work to the right model at the right cost, and build production information extraction pipelines that combine classical NLP with LLM capabilities. You will also learn how to evaluate these systems to ensure they meet quality targets.

Big Picture

Not every problem needs a large language model, and not every LLM output should be trusted without verification. This chapter shows you when to combine classical ML with LLMs, building hybrid pipelines that are more accurate, faster, and cheaper than either approach alone. This pragmatic mindset carries through to the production chapters in Part VIII.

Learning Objectives

Prerequisites

Sections

What's Next?

In the next part, Part IV: Training and Adapting, we learn to adapt LLMs through synthetic data, fine-tuning, PEFT, distillation, and alignment.