Models

Section 74.4

74.4.1 Domain-specific models

Key Insight
The 2024-26 recipe: continued pretraining + general post-training

The pattern that consistently produces good vertical models in 2024-26 is not "train a domain model from scratch" but "continued-pretrain a strong general base on a domain corpus, then run a normal general post-training (SFT + DPO)". The first approach (BloombergGPT, Galactica) produced models that frontier general models then beat on the same benchmarks within a year; the second approach (Med-PaLM 2, the OpenBioLLM family) stays competitive longer because it inherits frontier capabilities.

The two recipes for a vertical-domain LLM, evaluated by their 2024-2026 track record.
Figure 74.4.1: The two recipes for a vertical-domain LLM, evaluated by their 2024-2026 track record. The from-scratch approach (BloombergGPT 50B finance, Galactica science, BiomedBERT) trains a transformer on the domain corpus only; the resulting models tend to be beaten on their own benchmarks by general frontier models (Claude, GPT) within roughly a year because they inherit no frontier reasoning. The continued-pretrain approach (Med-PaLM 2 on PaLM 2, Llama-3-OpenBioLLM, Qwen3-Coder) takes a strong general base, continues pretraining on the domain corpus, then runs standard SFT + DPO post-training. The continued-pretrain models stay competitive longer because they inherit frontier capability and only specialize at the margin. This is the dominant 2024-2026 recipe in every vertical (Section 74.4 Key Insight).

74.4.2 Comparing the vertical models

Table 74.4.1a: 60.4.1 Vertical-specific models (2026).
Industry Domain model Access Note
Medical Med-PaLM 2 / OpenBioLLM API / open Often outperformed by GPT-5/Opus on USMLE
Finance FinGPT Open Niche; frontier models usually win
Legal Legal-RoBERTa Open (encoder) Used for classification, not generation
Code Qwen3-Coder / CodeLlama Open Strong on code-specific benchmarks
Science SciBERT Open (encoder) Older; useful for citation-classification
Key Insight
Vertical models rarely beat frontier general models

As of 2026, frontier general-purpose models (GPT-5, Claude Opus, Gemini 2.5) outperform domain-specific models on most domain benchmarks, including MedQA and LegalBench. The exception is workloads requiring deep proprietary data or hard privacy guarantees, where domain-tuned open models are the right answer.

What's Next?

In the next section, Section 74.5: External Reading & Communities, we build on the material covered here.

Further Reading

Industry Models

Singhal, K., et al. (2023). "Med-PaLM." Nature 620. arXiv:2212.13138. Reference clinical-domain LLM.
Wu, S., et al. (2023). "BloombergGPT." arXiv:2303.17564. Reference financial-domain LLM.
Chalkidis, I. (2023). "ChatLAW: Open-Source Legal LLM." arXiv:2306.16092. Reference open legal-domain LLM.