Libraries & Frameworks

Section 78.2

"Paper-tracking, prototyping, evaluation. Three layers of the frontier-LLM library stack, and the only ones that survive each quarterly rewrite."

PipPip, Frontier-Library-Reader AI Agent
Note: Learning Objectives
Big Picture

The frontier-research library ecosystem has consolidated around three layers: paper-tracking (arxiv.py, ResearchRabbit, Elicit), reproducibility (Hydra, DVC, W&B), and reference implementations (nanoGPT, llm.c, lit-gpt). Knowing which layer to consult for which question is the difference between a research workflow that ships and one that stalls.

Prerequisites

This section assumes the frontier-LLM platform shelf from Section 78.1 and the LLM-library landscape from Section 14.2.

Three-layer frontier-research library stack with named tools
Figure 78.2.1: The 2026 frontier-research library stack collapses to three persistent layers. Paper-tracking spans arxiv.py for scripting, arxiv-sanity-lite for personalized feeds, ResearchRabbit for visual exploration, Elicit for fast cross-paper synthesis, Zotero as the canonical library, and paper-qa for RAG over scientific PDFs. Reproducibility runs Hydra (configs) plus DVC (data) plus W&B (experiment tracking), with Metaflow added when research workflows need to graduate to scheduled production. Reference implementations form a pedagogy-to-production ladder: nanoGPT and llm.c teach the algorithm, litgpt cleans the code, and torchtitan, maxtext, and gpt-neox cover production-scale pretraining on GPUs or TPUs.

78.2.1 Paper-tracking

78.2.2 Reproducibility libraries

78.2.3 Reference implementations

The 2025-26 frontier-research library shelf bifurcates clearly into pedagogy and production. Pedagogy stays on nanoGPT and litgpt for clarity. Production-scale pretraining moved to torchtitan (Meta, 2024), the modular PyTorch pretraining library, and maxtext (Google JAX) for TPU-friendly pretraining. transformers v5 (2025-Q1) is the modern production reference SDK; if your code still uses v4 idioms it is at least one major version stale. The historical name lit-gpt is now litgpt on PyPI (rename).

Tip: start small, then graduate

The right learning sequence is nanoGPT (understand the algorithm) → lit-gpt (understand the code) → gpt-neox or Megatron-LM (understand production scale). Skipping ahead to gpt-neox without nanoGPT first means you do not know what the wrappers are wrapping. Most engineers who try this skip step three later, because they realize step three is mostly boilerplate around the kernels in step one.

Key Takeaways

What's Next?

In the next section, Section 78.3: Datasets & Benchmarks, we build on the material covered here.

Further Reading
nanoGPT (Karpathy): the minimal reference.
llm.c (Karpathy): pure C/CUDA pretraining.
lit-gpt (Lightning AI).
Hydra, DVC, Weights & Biases: the reproducibility trio.
Elicit, ResearchRabbit: AI-assisted literature review.