Libraries & Frameworks

Section 74.2
Big Picture

Every vertical LLM application (healthcare, finance, legal, education) needs a small layer of glue between the model and the industry's existing data formats, standards, and domain corpora. This section catalogs that glue layer for each major vertical: FHIR clients and biomedical NLP toolkits in healthcare, the SEC EDGAR corpus and FinBERT family in finance, CourtListener and Legal-BERT in law, and learning-platform integrations in education. These are not LLM libraries themselves; they are the connector libraries and Software Development Kits (SDKs) that let an LLM application talk to the systems and corpora the industry already runs on. Pick from this list when you need to read electronic health records, ingest 10-K filings, retrieve case law, or wire into an LMS without writing the parser yourself.

Each industry has its own connector / SDK (Software Development Kit, a packaged set of code, samples, and documentation that lets you call a service from your application) ecosystem. The libraries below are the most commonly used integration points for building LLM applications in each vertical.

74.2.1 Healthcare libraries

The vertical NLP library stack as it consolidated by 2026.
Figure 74.2.1: The vertical NLP library stack as it consolidated by 2026. Healthcare layers spaCy + scispaCy + MedSpaCy + BiomedBERT on top of FHIR; finance layers FinBERT + FinGPT + openbb on top of SEC EDGAR plus the licensed market-data APIs (polygon, databento); legal layers Legal-BERT and Voyage-law on top of CourtListener; education layers LTI + Common Cartridge + LMS APIs as the integration layer. The pattern under every vertical is the same: a free or low-cost domain corpus plus a domain encoder plus a frontier LLM wrapper. The 2024-25 lesson is that pure-domain LLMs (BloombergGPT, Galactica) usually lose to frontier general models after a year, so the standard recipe is "continued-pretrain the strong general base on the domain corpus, then normal SFT + DPO".

74.2.2 Finance libraries

74.2.4 Education libraries

A 2024-25 trend in vertical AI worth flagging: open-source domain-specific LLMs (OpenMed, BioMedLM, MedFound for medicine; FinGPT for finance) are useful but rarely beat the frontier general models on the same vertical tasks. The standard recipe in 2026 is continued pretraining on a domain corpus plus a normal general post-training, not from-scratch domain models.

What's Next?

In the next section, Section 74.3: Datasets & Benchmarks, we build on the material covered here.

Further Reading

Industry Libraries

LangChain (2024). "LangChain Industry Templates." python.langchain.com. Reference templates for industry-specific RAG and agent flows.
LlamaIndex (2024). "LlamaIndex Documentation." docs.llamaindex.ai. Reference framework for document-grounded industry applications.