Section H.3: Quick Comparison Table

The table below provides a side-by-side overview for rapid reference when choosing a model for a project.

Model Comparison

Model	Params (Active)	Context	Open?	Vision	Reasoning
GPT-4o	~200B (est.)	128K	No	Yes	Good
o3	Undisclosed	200K	No	Yes	Excellent
o4-mini	Undisclosed	200K	No	Yes	Excellent
Claude 4 Sonnet	Undisclosed	200K	No	Yes	Very Good
Claude 4 Opus	Undisclosed	200K	No	Yes	Excellent
Gemini 2.5 Pro	Undisclosed	1M	No	Yes	Excellent
Gemini 2.0 Flash	Undisclosed	1M	No	Yes	Good
Llama 3.1 405B	405B	128K	Yes	No	Good
Llama 4 Maverick	17B active	1M	Yes	Yes	Good
Mixtral 8x22B	39B active	64K	Yes	No	Fair
DeepSeek-V3	37B active	128K	Yes	No	Good
DeepSeek-R1	37B active	128K	Yes	No	Excellent
Qwen 2.5 72B	72B	128K	Yes	No	Good
QwQ-32B	32B	128K	Yes	No	Very Good
Phi-4	14B	16K	Yes	No	Very Good
Gemma 3 27B	27B	128K	Yes	Yes	Fair

Rapid Change Advisory

This appendix reflects the model landscape as of early 2026. New model releases occur frequently, and specifications, pricing, and capabilities shift with each release. Always verify details against the official model documentation and release announcements. Benchmark scores are intentionally omitted because they become outdated within weeks and can be misleading due to data contamination.

How to Choose a Model

Start with your constraints: (1) Can you send data to a third-party API, or do you need self-hosting? (2) What is your latency budget? (3) What is your cost ceiling per request? (4) Do you need vision, long context, or strong reasoning? These four questions will narrow the field to 2-3 candidates. Then prototype with your actual data and measure what matters for your specific use case.

* * *

Documentation Frameworks: Model Cards, Datasheets, and Data Cards

The model cards in this appendix follow a tradition of structured documentation for ML artifacts. Three complementary frameworks have emerged as standards, each addressing a different artifact and audience. Understanding their differences helps teams choose the right documentation strategy for their projects.

Model Cards (Mitchell et al., 2019)

Model cards document the model itself: intended use cases, performance metrics disaggregated by demographic group, known limitations, and ethical considerations. Originally proposed at FAT* 2019, model cards are now standard on Hugging Face, where every model repository includes a card rendered from a structured README.md. Model cards answer the question: "Should I use this model for my task, and what should I watch out for?"

Datasheets for Datasets (Gebru et al., 2021)

Datasheets document the training or evaluation data behind a model. The framework organizes documentation into seven sections: motivation (why the dataset was created), composition (what the data contains), collection process (how it was gathered and by whom), preprocessing (cleaning, filtering, labeling steps), uses (intended and prohibited applications), distribution (how the dataset is shared), and maintenance (who maintains it and how to report issues). Datasheets answer the question: "Can I trust this data, and is it appropriate for my use case?"

Data Cards (Google, 2022)

Google's Data Cards Playbook extends the datasheet concept with a more structured, template-driven approach designed for enterprise adoption. Data cards include quantitative summaries (dataset size, label distributions, demographic breakdowns) alongside qualitative descriptions, making them easier to generate semi-automatically from metadata. The playbook provides fillable templates and review checklists that integrate into MLOps workflows.

Comparison: Documentation Frameworks

Model Cards vs. Datasheets vs. Data Cards

Framework	Artifact Type	Primary Audience	Key Sections	Adoption
Model Cards	Trained model	Downstream developers, auditors	Intended use, metrics by group, limitations, ethical considerations	Widespread (Hugging Face, major providers)
Datasheets	Dataset	Researchers, data curators	Motivation, composition, collection, preprocessing, distribution, maintenance	Growing (academic standard, NeurIPS requirement)
Data Cards	Dataset	Enterprise ML teams, compliance	Quantitative summaries, schema, provenance, sensitivity labels	Moderate (Google ecosystem, enterprise adoption)

Operationalizing Documentation in Training Pipelines

Documentation should not be a manual afterthought. Modern MLOps pipelines can generate documentation artifacts automatically. Hugging Face's huggingface_hub library provides ModelCard and DatasetCard classes that populate templates from training metadata (metrics, hyperparameters, dataset statistics). Google's Data Cards Playbook includes scripts that extract schema information and compute summary statistics directly from data files. The goal is to make documentation a build artifact: generated during training, versioned alongside model weights, and reviewed during the deployment approval process.

Tools for Documentation

Hugging Face Dataset Cards: Every dataset on the Hub includes a structured card with YAML metadata (task type, languages, license) and freeform sections. The datasets library can auto-generate skeleton cards from dataset metadata. Google Data Cards Playbook: Provides PDF and digital templates, a facilitator guide for team workshops, and example cards for reference datasets. Both tools lower the barrier to producing useful documentation, though human review remains essential for nuanced content like limitation descriptions and ethical considerations.

Tip

Documentation is a living artifact. Automate what you can (statistics, schema, performance metrics), but reserve human judgment for what you must (limitations, ethical considerations, known biases). Schedule quarterly reviews of model and dataset cards, especially after retraining or data pipeline changes. Stale documentation is worse than no documentation because it creates false confidence.