Section 10.11: External Reading & Communities

"The papers that matter this quarter are not yet on arXiv. Half are on X, half are on Discord, and a third (the math is fuzzy here) are still in someone's notebook."
Frontier, Pre-Print-Wanderer AI Agent

Big Picture

Part II's external-reading list is centrally about three things: keeping current with the frontier model releases, internalizing the canonical pretraining and inference papers, and finding the working LLM-research community (Discords, mailing lists, conferences) where the actual debugging happens. This section is the curated map of where to look when the textbook ends and the field continues moving.

Prerequisites

This section is the end-of-part reading list and assumes you have worked through the rest of Part II (modules 6 through 10). No new technical prerequisites; some sources presuppose comfort with transformer mechanics, scaling laws, and quantization formats from earlier sections.

Part II's external-reading list is centrally about three things: keeping current with the frontier model zoo, going deep on tokenization, and going deep on mechanistic interpretability. Each topic has a small set of canonical resources that consistently outperform "AI news" coverage. Subscribe to the originals; the secondary aggregators are noise.

10.11.1 Leaderboards to bookmark

Fun Fact

The LMSYS Chatbot Arena leaderboard is one of the few public benchmarks that frontier labs cannot easily game because the test items are live human prompts that arrive faster than any model can be trained on them. It is also one of the rare places where you can watch a $10 billion lab tie with a 7B open-weight model on a Tuesday afternoon.

LM Arena: blind pairwise comparisons, updated weekly. The single most-trusted ranking of chat models.
Chatbot Arena leaderboard (HF Space): same data, sortable by category, model size, and licensing.
Artificial Analysis: cost-per-token, latency, throughput across providers. The "how cheap is this in production" tracker.
LiveBench: contamination-resistant benchmark with monthly question rotation.
Epoch AI benchmarks (FrontierMath and friends): the math-reasoning frontier tracker.

10.11.2 Lab blogs and engineering posts

Transformer Circuits (Anthropic): every important mech-interp paper of 2022-26. Required reading for Chapter 10.
OpenAI Research: the GPT family's model cards and "system card" disclosures.
Google DeepMind blog: Gemini disclosures, Gemma releases, alignment-research updates.
Hugging Face blog: tokenizer internals, dataset releases, FineWeb / SmolLM technical reports.
EleutherAI blog: the lm-evaluation-harness updates, Pythia analysis, open-source pretraining.

10.11.3 Newsletters and survey blogs

Ahead of AI (Sebastian Raschka): monthly. Best single newsletter for "what shipped" with benchmarks.
Lilian Weng's blog: quarterly long-form surveys. The 2024 "Why we think" and 2025 reasoning-models survey are the modern anchors.
Interconnects (Nathan Lambert): weekly. Strong on alignment, RLHF, evaluation politics.
Alignment Forum: serious alignment-research discussion.
smol.ai newsletter (swyx): daily AI-engineering newsletter; good filter for "which Twitter / Bluesky thread mattered today".
Karpathy: "Let's reproduce GPT-2 (124M)" (2024): the most-watched practical pretraining walkthrough; works as the missing companion to Chapter 7.
AI Safety Fundamentals: the dominant 2025 alignment-onboarding curriculum, free.

10.11.4 Mechanistic interpretability deep-reading

Neel Nanda: "A Comprehensive Mechanistic Interpretability Explainer & Glossary": the field's pedagogical anchor.
Olsson et al.: "In-context Learning and Induction Heads": still the most-cited paper-level introduction to circuit-level analysis.
Templeton et al.: "Scaling Monosemanticity": 34M-feature SAE on Claude 3 Sonnet. The breakthrough that made SAEs production-scale.
Anthropic: "Attribution Graphs and Cross-Layer Transcoders" (March 2025): the next-generation circuit-tracing methodology.
2025 "Mechanistic Interpretability 1-year Update" (Anthropic, 2025-Q3): the current canonical survey replacing the 2022 Olsson paper as the newcomer's first read.

Tip: Where the academic ML conversation moved in 2025

By 2025, Bluesky overtook X for the academic-ML poster crowd (Karpathy, Raschka, many CS faculty), while X continues to host commercial-lab announcements. Cross-post-following both is reasonable; if you must pick one, Bluesky is the better academic firehose. The smol.ai newsletter still aggregates across both platforms.

Tip: spend 15 minutes a day, not 2 hours

The mistake most people make tracking 2026 AI is spending too much time, not too little. Fifteen minutes a day on the daily/weekly tier above, plus an hour a month on the monthly tier, beats any "Twitter-scrolling all morning" routine. The frontier moves quickly but most weeks add less than 15 minutes of genuinely new technical content.

Key Takeaways

LM Arena and Artificial Analysis are the load-bearing leaderboards: blind pairwise Elo for chat-quality ranking and cost-per-token plus latency tracking for production economics, with LiveBench and Epoch AI handling contamination-resistant evaluation.
Transformer Circuits and the lab blogs beat secondary aggregators: Anthropic's Transformer Circuits, OpenAI Research, Google DeepMind, Hugging Face, and EleutherAI publish the primary material before the news cycle picks it up.
Newsletters and surveys cover the mid-cadence: Ahead of AI, Lilian Weng's quarterly long-form, Interconnects on alignment politics, Alignment Forum on serious safety research, and smol.ai for daily filtering.
Mech-interp deep reading anchors on Anthropic's lineage: Nanda's glossary, Olsson et al. on induction heads, Templeton et al. on scaling monosemanticity, and the 2025 Attribution Graphs paper give the field's pedagogical and methodological spine.
Bluesky displaced X for academic ML in 2025: Karpathy, Raschka, and many CS faculty migrated to Bluesky, while X retains commercial-lab announcements, so smol.ai's cross-platform aggregation is the practical compromise.
Fifteen minutes a day beats two-hour scroll sessions: most weeks add less than 15 minutes of genuinely new technical content, so daily plus weekly tiers plus a monthly hour outpace continuous Twitter checking.

What's Next?

This chapter completes the current part. The next part, Part III: Working with LLMs, opens a new arc; see the part index for chapter ordering.

Further Reading

Foundational Mechanistic Interpretability

Olah, C., Cammarata, N., Schubert, L., et al. (2020). "Zoom In: An Introduction to Circuits." Distill. distill.pub/2020/circuits/zoom-in

The Distill essay that named the field; the canonical entry point for the circuit-level program of mechanistic interpretability.

Elhage, N., Nanda, N., Olsson, C., et al. (2021). "A Mathematical Framework for Transformer Circuits." Anthropic / Transformer Circuits Thread. transformer-circuits.pub/2021/framework

The formal decomposition of attention-only transformers into circuits; the technical foundation underlying nearly all later mech-interp work.

Olsson, C., Elhage, N., Nanda, N., et al. (2022). "In-Context Learning and Induction Heads." Anthropic / Transformer Circuits Thread. transformer-circuits.pub/2022/in-context-learning

The induction-head paper; the most-cited mechanistic-interpretability result of 2022 and still the textbook example of a "discovered" circuit.

Elhage, N., Hume, T., Olsson, C., et al. (2022). "Toy Models of Superposition." Anthropic / Transformer Circuits Thread. transformer-circuits.pub/2022/toy_model

The paper that put "superposition" on the mech-interp map; the conceptual basis for the SAE / dictionary-learning research program that dominated 2023-2025.

Sparse Autoencoders and Feature Dictionaries

Bricken, T., Templeton, A., Batson, J., et al. (2023). "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning." Anthropic / Transformer Circuits Thread. transformer-circuits.pub/2023/monosemantic-features

The Anthropic paper that demonstrated SAEs on a small transformer; the work that turned "dictionary learning on activations" into a serious research direction.

Templeton, A., Conerly, T., Marcus, J., et al. (2024). "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet." Anthropic / Transformer Circuits Thread. transformer-circuits.pub/2024/scaling-monosemanticity

The Claude 3 Sonnet feature-extraction paper; the first demonstration of SAE-discovered features on a frontier production model, and a major proof of concept for interpretability at scale.

Cunningham, H., Ewart, A., Riggs, L., et al. (2023). "Sparse Autoencoders Find Highly Interpretable Features in Language Models." ICLR 2024. arXiv:2309.08600

The independent academic SAE result published alongside the Anthropic work; widely used as the standard external citation for SAE feature discovery.

Gao, L., la Tour, T.D., Tillman, H., et al. (2024). "Scaling and Evaluating Sparse Autoencoders." OpenAI. arXiv:2406.04093

The OpenAI SAE-on-GPT-4 paper; demonstrated SAE scaling laws and pushed dictionary learning into the frontier-model regime.

Analysis Tools and Libraries

Nanda, N. & Bloom, J. (2022-2026). "TransformerLens: A Library for Mechanistic Interpretability of Transformer Language Models." TransformerLens Documentation. transformerlensorg.github.io/TransformerLens

The de-facto standard library for hooking and probing transformer internals; the entry-point tool for almost every mech-interp tutorial.

Lin, J., Bloom, J., et al. (2024-2026). "Neuronpedia: An Interactive Database of Sparse Autoencoder Features." Neuronpedia. neuronpedia.org

The community feature-explorer; the most-used interface for browsing SAE features on open models (Gemma-2, Llama-3, GPT-2).

Sarti, G., Feldhus, N., Sickert, L., & van der Wal, O. (2023). "Inseq: An Interpretability Toolkit for Sequence Generation Models." ACL 2023 System Demonstrations. arXiv:2302.13942

The most-cited open-source library for feature-attribution and saliency analysis on generative LLMs.

Marks, S., Rager, C., Michaud, E.J., et al. (2024). "Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models." ICLR 2025. arXiv:2403.19647

The paper bridging SAEs and circuit-level analysis; the basis for the "sparse feature circuits" pipeline now widely used to audit and edit LLM behavior.

Benchmarks and Reasoning Probes

Wang, Y., Ma, X., Zhang, G., et al. (2024). "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark." NeurIPS 2024. arXiv:2406.01574

The 2024 reasoning-focused upgrade of MMLU; widely used as a coarse-grained probe of compositional reasoning behavior to triangulate against mech-interp findings.

Suzgun, M., Scales, N., Schärli, N., et al. (2022). "Challenging BIG-Bench Tasks and Whether chain-of-thought Can Solve Them." ACL 2023 (BIG-Bench Hard). arXiv:2210.09261

The BIG-Bench Hard subset; the canonical reasoning-probe harness used to stress-test circuit-level hypotheses about chain-of-thought behavior.

Wang, K., Variengien, A., Conmy, A., et al. (2022). "Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small." ICLR 2023. arXiv:2211.00593

The IOI circuit paper; the canonical end-to-end circuit-discovery case study and the de-facto worked example for any mech-interp course.