Working with LLM APIs

Chapter opener illustration: Working with LLM APIs.

"The best interface is the one that disappears."

PipPip, Invisibly Helpful AI Agent
Looking Back

Part II ended with you understanding LLMs from the inside. Part III turns to using them from the outside. The starting point is the simplest possible interface: an API call. This chapter covers how the major providers expose their models (OpenAI, Anthropic, Google), the patterns they share (chat completions, function calling, streaming, structured output), and the patterns that distinguish them. By the end you can swap providers in an afternoon.

Chapter Overview

Large language models are only as useful as the interface through which you access them. For the vast majority of production applications, that interface is an API: a set of HTTP endpoints exposed by OpenAI, Anthropic, Google, or an open-source serving framework. Knowing how to call these APIs correctly, efficiently, and reliably is a core skill for any engineer building with LLMs.

This chapter covers the full lifecycle of working with LLM APIs. We begin with the landscape of providers and their architectural differences, then move into structured output techniques and tool integration patterns that let models interact with external systems (a prerequisite for building AI agents). Finally, we tackle the engineering challenges of running LLM calls in production: routing across providers, caching, retry strategies, circuit breakers, cost management, and observability.

Big Picture

For most practitioners, LLM APIs are the primary interface to model capabilities. This chapter teaches you to work with chat completions, manage rate limits, handle errors gracefully, and optimize costs. These API patterns form the backbone of every application built in Parts V and VI.

Note: Learning Objectives

Prerequisites

Sections

What's Next?

Next: Chapter 12: Prompt Engineering & Advanced Techniques. The SDK call is the easy part; getting useful output from it is not. Chapter 12 covers what to put inside that prompt string: few-shot demonstrations, chain-of-thought, self-consistency, prompt-injection-resistant templates, and the structured-output tricks that turn "vibes-based prompting" into a repeatable engineering practice.