Section 61.5

External Reading and Communities

"SC for the foundations, MLSys for the methods, Hugging Face for the playbook, r/LocalLLaMA for the ground truth. One stream a week and the field cannot lose you."

TensorTensor, Reading-List-Curating Scale AI Agent
Big Picture

Staying current on LLM systems at scale requires sampling four streams. First, the systems and HPC venues (SC / Supercomputing, ISC, MLSys, OSDI, SOSP, EuroSys, NSDI) where the foundational training-systems papers are published. Second, the foundational papers themselves: Megatron-LM, the DeepSpeed series, GPT-3 / PaLM / OPT / BLOOM training reports, the DeepSeek-V3 paper, the Llama-3 herd paper, the GShard / Mixture-of-Experts papers. Third, the engineering blogs that translate research into practice: Hugging Face's scale series, PyTorch Engineering, the DeepSpeed blog, Yi Tay (Reka), Nathan Lambert (Interconnects), the various lab tech blogs. Fourth, the communities where practitioners exchange tactical know-how: r/LocalLLaMA infrastructure threads, EleutherAI Discord, MLPerf and MLCommons working groups, the open-source library issue trackers themselves. A practitioner who samples one of each stream weekly stays within a week of the field; one who samples only Twitter / X stays in a noisy filter bubble.

The systems-at-scale field is unusual in that the most important results are often published not in academic venues but in vendor technical reports and engineering blogs. The Llama-3 herd paper, the DeepSeek-V3 paper, the OPT-175B logbook, the BLOOM book, and the various Anthropic / Google / OpenAI technical reports are arguably more practically influential than the median MLSys paper. The reading list below therefore mixes academic and industrial sources roughly evenly, weighted by what practitioners actually cite in production work.

61.5.1 Conferences and academic venues

The systems-side conferences are where the foundational training-infrastructure papers appear. They differ from the ML conferences (NeurIPS, ICML, ICLR) which focus on model and method advances rather than systems.

61.5.2 Foundational papers to read

A 20-paper canon for LLM systems at scale work, ordered roughly from architectural foundations through training systems through specific frontier-scale writeups.

61.5.3 Engineering blogs and technical writeups

The engineering blogs are where the practical "how to actually do this" knowledge lives, often closer to production reality than academic papers.

61.5.4 Communities and forums

Communities are where the tacit knowledge lives: which library breaks on which GPU driver, which dataset has known bugs, which inference engine actually handles MoE well at production scale. The 2026 high-signal communities:

61.5.5 Newsletters and podcasts

Curated weekly or biweekly reading reduces the firehose to a tractable signal.

61.5.6 Books on LLM systems and scale

The book-length references for serious systems work; relatively few specifically target LLM training scale, but several adjacent texts are essential.

61.5.7 A weekly reading cadence

A practitioner running production LLM systems at scale typically allocates 2 to 5 hours per week to reading. A reasonable allocation:

Key Insight
The field changes fast enough that any reading list is stale; the meta-skill is curation

Any specific reading list in this section will be partly stale within six months. The 2024 list would have included papers and blogs that are now superseded; the mid-2025 list would have omitted DeepSeek-V3 (December 2024) and DeepSeek-R1 (January 2025) which became canonical reading within a quarter. The meta-skill is not "read this list" but "build a curation pipeline." Subscribe to the high-signal sources, follow the right people on X / GitHub / Discord, and refresh the list quarterly. The five-year-out reading list will look nothing like this one; the curation muscle is what carries over.

61.5.8 Engaging with the community

Beyond consumption, contribution to the community is itself a way to stay current. The most accessible paths:

Library Shortcut
huggingface_hub for shipping a checkpoint to the world

Once your fine-tune or distillation produces a checkpoint worth sharing, the huggingface_hub client (v0.26+, 2024 to 2026) is the canonical upload path. HfApi().upload_folder(...) streams arbitrarily large weight files, generates a default README.md model card, computes content hashes for git-LFS deduplication, and gives you back the URL the community will cite. The same client downloads with snapshot_download(...) for license-checking and reproducibility audits before deployment.

Show code
pip install -U huggingface_hub
from huggingface_hub import HfApi, ModelCard, ModelCardData, login

login(token=os.environ["HF_TOKEN"])

# 1. Create the repo (private until you flip the switch).
api = HfApi()
api.create_repo("your-org/llama3-finetune-r1", private=True, exist_ok=True)

# 2. Write a model card with license, base model, and eval results.
card = ModelCard.from_template(
    card_data=ModelCardData(
        language="en", license="llama3", base_model="meta-llama/Llama-3-8B",
        tags=["fine-tuned", "instruction-tuned"],
    ),
    model_description="Llama-3-8B fine-tuned on OpenHermes-2.5.",
)
card.save("./out/README.md")

# 3. Push all checkpoint shards in one streaming upload.
api.upload_folder(
    folder_path="./out",
    repo_id="your-org/llama3-finetune-r1",
    repo_type="model",
    commit_message="v1.0: SFT on OpenHermes-2.5",
)
Code Fragment 61.5.8.1: One upload_folder call ships a multi-shard checkpoint, model card, and license metadata.
Real-World Scenario
How an open-weight team kept current in 2024-2026

A mid-sized open-weight research team in 2024-2026 reported the following reading and engagement pattern as their actual operational practice: every Monday, the team's tech lead reviewed the previous week's r/LocalLLaMA top posts (15 minutes), the Interconnects newsletter (15 minutes), and one chosen paper from arxiv-sanity (1 hour); every Thursday, the team's training-systems engineer reviewed the PyTorch and Hugging Face blogs and any new releases on the major training-framework GitHubs (30 minutes); the team filed at least one upstream issue or PR to a relevant open-source project every month; and the team gave one external talk per quarter (workshop, meetup, or conference). The senior team members credited this pattern with consistently staying ahead of competitor open-weight teams that did not maintain a similar discipline. The cost was approximately 3 hours per person per week, on the low end for serious technical fields. The lesson generalizes: a small, sustained reading-and-contribution practice substantially compounds, while sporadic catch-up reading does not.

61.5.9 Mapping the reading and community landscape

LLM scale reading and community map
Figure 61.5.1: The 2026 reading and community landscape for LLM systems at scale: papers, model hubs, leaderboards, blogs, conferences, and chat communities that keep practitioners current.

61.5.10 Vendor technical reports and frontier disclosures

Beyond peer-reviewed papers, the 2024-2026 frontier-lab technical reports are essential reading. These are released alongside model launches and provide the closest-to-ground-truth information about how frontier-scale work is actually done.

These reports are typically the canonical reference for the model's architecture, training data scale (sometimes), training recipe (often), and evaluation results. Read the ones for the models you actually use; treat the safety / capability claims as marketing-flavored.

61.5.11 Survey papers and state-of-the-field reports

For a broad understanding of the field's trajectory, periodic survey papers and industry reports help:

61.5.12 Courses and structured learning

For practitioners new to the field or refreshing fundamentals, several courses have become canonical references:

Looking Back
the platform / library / dataset / model decomposition of Chapter 61

Chapter 61 organized the scale tools-of-the-trade catalog along four axes that re-appear in every chapter of Parts VII-XII: platforms (61.1: hyperscalers vs specialized GPU clouds vs in-house datacenters, plus HPC schedulers, parallel storage, and training observability); libraries and frameworks (61.2: Megatron / DeepSpeed / FSDP2 / Colossal-AI as the foundation distributed-training layer, plus the high-level recipes, optimization kernels, communication libraries, orchestrators, and compilers); datasets and benchmarks (61.3: open pretraining corpora like FineWeb / RedPajama-v2, alignment / instruction datasets, multimodal corpora, MLPerf Training, lm-eval-harness, the canary-and-decontamination methodology); and models (61.4: the open-weight frontier from Llama through DeepSeek, the closed-weight frontier from Claude / GPT / Gemini, and the scaling-law-derived choice of which size to actually train). The fifth section (61.5) was the reading list and community map that lets you keep all four axes current as the field moves. The pattern generalizes: every "tools of the trade" chapter in this book is organized as platforms / libraries / datasets / models / external reading. Master this decomposition once and you have a template for navigating any of them.

What's Next: Part XIII closes the loop with LLMOps

Continue to Section 62.1: Scaling, Performance & Production Guardrails. Part XII (Chapters 56-61) covered LLM systems at scale: the platforms, hardware, training systems, edge deployment, and tools-of-the-trade catalog. Part XIII (LLMOps, Chapters 62-66) now turns from "build the system" to "operate the system in production for years." The transition is direct: Chapter 62 picks up exactly where Chapter 61 leaves off, covering production engineering core (deployment, scaling, performance guardrails, the SRE practices that turn a 50-day pretraining run into a 5-year production assistant). Chapters 63-66 then cover MLOps lifecycle (CI/CD for LLMs, model registries, drift detection), observability and monitoring at production cadence (the cluster-side observability of 59.5 generalized to per-request telemetry), incident response and continuous improvement, and the LLMOps tools-of-the-trade catalog. The same platform / library / dataset / model decomposition recurs in Chapter 66. The conceptual thread is that training is a finite project but operation is forever; the scale chapter taught you to spend $3M efficiently for two months, the LLMOps chapters teach you to spend $50K/month forever, well.

61.5.13 Research labs and groups to follow

Beyond following individual papers, following specific labs gives you advance notice of upcoming work:

Further Reading
Vaswani, A. et al. (2017). "Attention Is All You Need." NeurIPS 2017. arxiv.org/abs/1706.03762. The original Transformer paper; the foundational reading that all of LLM systems work descends from.
Hoffmann, J. et al. (2022). "Training Compute-Optimal Large Language Models." arXiv preprint arXiv:2203.15556. arxiv.org/abs/2203.15556. The Chinchilla paper; the canonical reference for compute-optimal scaling that defines the "20 tokens per parameter" rule.
BigScience Workshop (2022). "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model." arXiv preprint arXiv:2211.05100. arxiv.org/abs/2211.05100. The BLOOM technical report; the canonical open multilingual pretraining reference and a foundational logbook of what frontier training operations look like.
Zhang, S. et al. (2022). "OPT-175B Logbook." Meta AI Research. github.com/facebookresearch/metaseq/projects/OPT/chronicles. The OPT-175B logbook; mandatory reading on the operational reality of frontier-scale pretraining failures and recovery.
Huyen, C. (2022). Designing Machine Learning Systems. O'Reilly Media. oreilly.com/library/view/designing-machine-learning. The production-ML-systems reference book; broader than just LLM scale but the canonical book-length reference on ML systems engineering.
Lambert, N. (2024). "Interconnects: AI, policy, and post-training." Substack newsletter. interconnects.ai. The highest-signal regular newsletter for post-training, RLHF, and open-model developments in 2024-2026.

Frontier-Lab Model Disclosures

Dubey, A., et al. (2024). "The Llama 3 Herd of Models." Meta AI. arXiv:2407.21783
DeepSeek-AI (2024). "DeepSeek-V3 Technical Report." DeepSeek. arXiv:2412.19437
Gemini Team, Google (2024). "Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context." Google DeepMind. arXiv:2403.05530
Anthropic (2024). "Claude 3.5 Sonnet Model Card Addendum." Anthropic. anthropic.com Model Card
Qwen Team (2024). "Qwen2.5 Technical Report." Alibaba Cloud. arXiv:2412.15115

Pretraining Frameworks

Shoeybi, M., et al. (2019). "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism." NVIDIA. arXiv:1909.08053
Rajbhandari, S., et al. (2020). "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models." SC20. arXiv:1910.02054
Zhao, Y., et al. (2023). "PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel." VLDB 2023. arXiv:2304.11277
Liang, W., et al. (2024). "torchtitan: One-stop PyTorch native solution for production ready LLM pretraining." Meta AI. arXiv:2410.06511
Hugging Face (2024). "nanotron: Minimalistic large language model 3D-parallelism training." GitHub. github.com/huggingface/nanotron

Compute Economics and Scaling Laws

Kaplan, J., et al. (2020). "Scaling Laws for Neural Language Models." OpenAI. arXiv:2001.08361
Hoffmann, J., et al. (2022). "Training Compute-Optimal Large Language Models (Chinchilla)." DeepMind. arXiv:2203.15556
Sevilla, J., et al. (2022). "Compute Trends Across Three Eras of Machine Learning." IJCNN 2022. arXiv:2202.05924

Community Resources and Newsletters

Epoch AI (2024). "Tracking Large-Scale AI Models: Methodology and Database." Epoch AI. epoch.ai/data/large-scale-ai-models
Stanford CRFM (2024). "HELM: Holistic Evaluation of Language Models." Stanford Center for Research on Foundation Models. crfm.stanford.edu/helm