
By May 2026, the LLM stack has settled enough to teach, and shifted enough to retell. GPT-5-omni, Claude 4, Gemini 2 Pro Vision, Llama 4 Scout, and DeepSeek-R1 share the frontier with open-weight models you can run on a laptop. Veo 3, Sora 2, and Genie 3 turned video into a first-class output modality. Pi-0.5 and OpenVLA shipped robots that translate natural language into motor commands. The Model Context Protocol replaced a year of bespoke tool-glue. Test-time compute scaling stopped being a research curiosity and became a billable line item. None of this was settled when most existing books were drafted.
Three kinds of resources exist for LLMs in 2026: textbooks that go deep on transformer math but stop before production; tutorials that ship a RAG demo in a weekend but never explain why the embedding model matters; and arXiv papers that move faster than any book can absorb. You will find a fourth thing here. The 79 chapters (Chapters 0 through 78) across fifteen parts carry one practitioner from "I can write Python" to "I can defend a build-vs-buy decision in a design review and ship the system I chose." The foundations are derived from first principles; the production guidance comes from systems that actually run; and the frontier chapters acknowledge that the frontier is still moving.
What You Will Be Able to Do
You will be able to derive the attention mechanism from scratch in PyTorch, and explain why the same math powers GPT-5, Claude 4, and a 7B model running on your laptop. You will be able to LoRA fine-tune a 7B model on domain data, decide when fine-tuning beats prompting, and quantify the answer. You will be able to design a RAG system that grounds answers in your own corpus, instrument it for hallucination drift, and ship it behind a streaming endpoint. You will be able to build an agent that uses MCP to talk to your existing systems, recover from failed tool calls, and run within a cost budget. You will be able to read a 2026 arXiv paper on emergent abilities, sparse autoencoders, or scaling frontiers, judge whether the claim holds, and reproduce a small version of the experiment.
You do not have to read it in order. FM.2 What This Book Covers walks the fifteen parts and shows how they depend on one another. FM.3 Who Should Read This Book tells you whether you are in the target audience. Appendix C: Reading Pathways lays out eight goal-based routes (weekend RAG bot, two-week agent builder, ML-to-LLM transition, semester research curriculum, founder/PM, instructor, interpretability and safety specialist, curious generalist). Pick the pathway that matches your situation and the book becomes a much shorter read.
If any of this matches what you are trying to do, the next page tells you what is inside.
Alexander Apartsin & Yehudit Aperstein
May 2026