External Reading & Communities

Section 19.6

Part IV's literature is split between the academic papers introducing each algorithm, the practical blog posts explaining what works, and the open-source communities that ship the recipes.

19.6.1 Foundational papers

19.6.2 Tutorials and recipes

19.6.3 Communities

Tip: A working pattern

Treat the alignment-handbook and AllenAI's open-instruct as your reference recipes. Read them before writing your own training script; reuse 90 percent. The 10 percent you do change should be a single, named, deliberate change so the result is interpretable.

What's Next?

In the next section, Section 19.7: Hugging Face Datasets and Tokenizers, we build on the material covered here.

Further Reading

Practitioner Guides

Karpathy, A. (2024). "Let's build the GPT Tokenizer." YouTube. Reference walkthrough on training a BPE tokenizer.
Karpathy, A. (2024). "Let's Reproduce GPT-2 (124M)." YouTube. Reference end-to-end pretraining walkthrough.

Communities

EleutherAI (2024). "EleutherAI Discord and Research Forum." eleuther.ai. The largest open-source LLM research community.