Part III: Working with LLMs

Chapter 10: Working with LLM APIs

"The best interface is the one that disappears."

Pip Pip, Invisibly Helpful AI Agent
Working with LLM APIs chapter illustration
Figure 10.0.1: Think of an LLM API as a hotel concierge: you describe what you need, hand over your key, and the magic happens behind the desk.

Chapter Overview

Large language models are only as useful as the interface through which you access them. For the vast majority of production applications, that interface is an API: a set of HTTP endpoints exposed by OpenAI, Anthropic, Google, or an open-source serving framework. Knowing how to call these APIs correctly, efficiently, and reliably is a core skill for any engineer building with LLMs.

This chapter covers the full lifecycle of working with LLM APIs. We begin with the landscape of providers and their architectural differences, then move into structured output techniques and tool integration patterns that let models interact with external systems (a prerequisite for building AI agents). Finally, we tackle the engineering challenges of running LLM calls in production: routing across providers, caching, retry strategies, circuit breakers, cost management, and observability.

Big Picture

For most practitioners, LLM APIs are the primary interface to model capabilities. This chapter teaches you to work with chat completions, manage rate limits, handle errors gracefully, and optimize costs. These API patterns form the backbone of every application built in Parts V and VI.

Learning Objectives

Prerequisites

Sections

What's Next?

In the next chapter, Chapter 11: Prompt Engineering, we learn the techniques for crafting effective prompts, from few-shot examples to chain-of-thought reasoning.