"Reading thirty-five chapters about LLM systems teaches you a vocabulary. Shipping one teaches you the craft."
Sage, Newly-Graduated AI Agent
The capstone is the integrating exercise for everything in the book: a single production-grade LLM application built end-to-end. You will make architectural decisions that balance competing concerns (quality vs. latency, accuracy vs. cost, flexibility vs. reliability) and produce a working system, a published model and dataset, a technical report, and a 15-minute presentation. This is where you cross from reading about LLM systems to actually shipping one.
Project Overview
The capstone project is the culminating experience of this book. You will design, build, and present a complete LLM-powered system that demonstrates mastery across the full stack: data preparation, model training and adaptation, retrieval-augmented generation, agent orchestration, production deployment, evaluation, and business strategy.
Unlike individual module labs that focus on a single technique, the capstone requires architectural decisions that balance competing concerns: model quality versus latency, accuracy versus cost, flexibility versus reliability. These tradeoffs are what distinguish a classroom exercise from a production system.
You will work on this project over approximately 4 to 6 weeks. The project culminates in a GitHub repository with working code, a model and dataset published on Hugging Face Hub, a written technical report, and a 15-minute presentation.
- Integration over novelty: The goal is not to invent a new architecture but to demonstrate that you can combine multiple techniques into a coherent, working system.
- Production mindset: Include evaluation suites, monitoring hooks, safety guardrails, and deployment configuration. A demo that works on a laptop is not enough.
- Business grounding: Frame the project around a real use case with measurable success criteria and an ROI estimate, not just a technical exercise.
- Honest evaluation: Report what does not work as thoroughly as what does. Identifying limitations demonstrates deeper understanding than cherry-picked results.
Learning Objectives
- Design an end-to-end LLM system architecture that balances quality, cost, latency, and safety
- Prepare and publish a synthetic or curated dataset suitable for fine-tuning
- Fine-tune or adapt a language model using techniques from Part IV
- Build a RAG pipeline with vector search, reranking, and citation generation
- Implement an agent with tool use, planning, and multi-step reasoning
- Deploy the system with appropriate security, monitoring, and observability instrumentation
- Design and execute a rigorous evaluation suite with both automated and human evaluation
- Produce a technical report with architecture diagrams, evaluation results, and honest limitation analysis
- Present the project in a clear, concise 15-minute format suitable for technical and business audiences
Capstone Pages
Suggested Timeline (6 weeks)
| Week | Focus |
|---|---|
| Week 1 | Design. Select use case, define requirements, design architecture, identify datasets. |
| Week 2 | Data + Model. Prepare synthetic dataset, begin fine-tuning or adapter training. |
| Week 3 | RAG + Agent. Build RAG pipeline, implement agent with tools, integrate components. |
| Week 4 | Deploy + Evaluate. Deploy to cloud, set up monitoring, run evaluation suite. |
| Week 5 | Refine. Address evaluation findings, add safety guardrails, optimize performance. |
| Week 6 | Report + Present. Write technical report, prepare presentation, publish artifacts. |
Deliverable Summary
- GitHub Repository with clean code, README, and deployment instructions.
- Hugging Face Hub artifacts: fine-tuned model and curated dataset.
- Technical Report (8 to 12 pages) with architecture, evaluation, and limitations.
- Interpretability Analysis documenting attention patterns, token attributions, or probing results.
- Live Demo (deployed or screencast) showing the system in action.
- Presentation (15 minutes) covering motivation, architecture, results, and lessons learned.
What Comes Next
Read Capstone C.1: Requirements & Deliverables for the per-component technical bar and the deliverable specifications. After that, the work is yours.