Appendices
Appendix H: Model Cards and Selection Guide

Quick Comparison Table

The table below provides a side-by-side overview for rapid reference when choosing a model for a project.

Model Comparison
Model Params (Active) Context Open? Vision Reasoning
GPT-4o~200B (est.)128KNoYesGood
o3Undisclosed200KNoYesExcellent
o4-miniUndisclosed200KNoYesExcellent
Claude 4 SonnetUndisclosed200KNoYesVery Good
Claude 4 OpusUndisclosed200KNoYesExcellent
Gemini 2.5 ProUndisclosed1MNoYesExcellent
Gemini 2.0 FlashUndisclosed1MNoYesGood
Llama 3.1 405B405B128KYesNoGood
Llama 4 Maverick17B active1MYesYesGood
Mixtral 8x22B39B active64KYesNoFair
DeepSeek-V337B active128KYesNoGood
DeepSeek-R137B active128KYesNoExcellent
Qwen 2.5 72B72B128KYesNoGood
QwQ-32B32B128KYesNoVery Good
Phi-414B16KYesNoVery Good
Gemma 3 27B27B128KYesYesFair
Rapid Change Advisory

This appendix reflects the model landscape as of early 2026. New model releases occur frequently, and specifications, pricing, and capabilities shift with each release. Always verify details against the official model documentation and release announcements. Benchmark scores are intentionally omitted because they become outdated within weeks and can be misleading due to data contamination.

How to Choose a Model

Start with your constraints: (1) Can you send data to a third-party API, or do you need self-hosting? (2) What is your latency budget? (3) What is your cost ceiling per request? (4) Do you need vision, long context, or strong reasoning? These four questions will narrow the field to 2-3 candidates. Then prototype with your actual data and measure what matters for your specific use case.

* * *

Documentation Frameworks: Model Cards, Datasheets, and Data Cards

The model cards in this appendix follow a tradition of structured documentation for ML artifacts. Three complementary frameworks have emerged as standards, each addressing a different artifact and audience. Understanding their differences helps teams choose the right documentation strategy for their projects.

Model Cards (Mitchell et al., 2019)

Model cards document the model itself: intended use cases, performance metrics disaggregated by demographic group, known limitations, and ethical considerations. Originally proposed at FAT* 2019, model cards are now standard on Hugging Face, where every model repository includes a card rendered from a structured README.md. Model cards answer the question: "Should I use this model for my task, and what should I watch out for?"

Datasheets for Datasets (Gebru et al., 2021)

Datasheets document the training or evaluation data behind a model. The framework organizes documentation into seven sections: motivation (why the dataset was created), composition (what the data contains), collection process (how it was gathered and by whom), preprocessing (cleaning, filtering, labeling steps), uses (intended and prohibited applications), distribution (how the dataset is shared), and maintenance (who maintains it and how to report issues). Datasheets answer the question: "Can I trust this data, and is it appropriate for my use case?"

Data Cards (Google, 2022)

Google's Data Cards Playbook extends the datasheet concept with a more structured, template-driven approach designed for enterprise adoption. Data cards include quantitative summaries (dataset size, label distributions, demographic breakdowns) alongside qualitative descriptions, making them easier to generate semi-automatically from metadata. The playbook provides fillable templates and review checklists that integrate into MLOps workflows.

Comparison: Documentation Frameworks

Model Cards vs. Datasheets vs. Data Cards
Framework Artifact Type Primary Audience Key Sections Adoption
Model Cards Trained model Downstream developers, auditors Intended use, metrics by group, limitations, ethical considerations Widespread (Hugging Face, major providers)
Datasheets Dataset Researchers, data curators Motivation, composition, collection, preprocessing, distribution, maintenance Growing (academic standard, NeurIPS requirement)
Data Cards Dataset Enterprise ML teams, compliance Quantitative summaries, schema, provenance, sensitivity labels Moderate (Google ecosystem, enterprise adoption)

Operationalizing Documentation in Training Pipelines

Documentation should not be a manual afterthought. Modern MLOps pipelines can generate documentation artifacts automatically. Hugging Face's huggingface_hub library provides ModelCard and DatasetCard classes that populate templates from training metadata (metrics, hyperparameters, dataset statistics). Google's Data Cards Playbook includes scripts that extract schema information and compute summary statistics directly from data files. The goal is to make documentation a build artifact: generated during training, versioned alongside model weights, and reviewed during the deployment approval process.

Tools for Documentation

Hugging Face Dataset Cards: Every dataset on the Hub includes a structured card with YAML metadata (task type, languages, license) and freeform sections. The datasets library can auto-generate skeleton cards from dataset metadata. Google Data Cards Playbook: Provides PDF and digital templates, a facilitator guide for team workshops, and example cards for reference datasets. Both tools lower the barrier to producing useful documentation, though human review remains essential for nuanced content like limitation descriptions and ethical considerations.

Tip

Documentation is a living artifact. Automate what you can (statistics, schema, performance metrics), but reserve human judgment for what you must (limitations, ethical considerations, known biases). Schedule quarterly reviews of model and dataset cards, especially after retraining or data pipeline changes. Stale documentation is worse than no documentation because it creates false confidence.