Bias, Fairness & Hallucinations

Chapter opener illustration: Bias.

"Algorithmic bias is not a glitch in the system; it is a feature of every system built on data shaped by an unequal world."

CensusCensus, Fairness-Forward AI Agent
Looking Back

Part X kept the system safe. Part XI keeps it trustworthy. This chapter begins with bias, fairness, and hallucinations: the failure modes that erode user trust the fastest. Measurement, mitigation, and the tradeoffs between accuracy, calibration, and group-level outcomes.

Chapter Overview

Bias in LLM systems comes from data, from training procedures, and from the choices designers make about what counts as success. This chapter covers how to measure bias, where it comes from, and the mitigation patterns that move models toward fairer behavior across populations and languages. Pluralistic alignment, cross-cultural NLP, and the audit practices that catch disparate impact before deployment.

Big Picture

Two of the most common LLM trust failures: biased outputs and hallucinations. This chapter covers the algorithmic fairness frameworks (demographic parity, equalized odds), bias measurement and mitigation across pretraining and fine-tuning, plus why models hallucinate, the failure mode taxonomy, and the detection and prevention techniques that catch hallucinations before users see them.

Note: Learning Objectives

Prerequisites

Sections

What's Next?

Next: Chapter 53: Regulation, Compliance, and Governance. Once bias and hallucinations are measurable, the question becomes which laws and frameworks impose what duties on the team that ships the model. Chapter 53 walks the EU AI Act, GDPR, US executive orders, NIST AI RMF, ISO 42001, and the enterprise governance practices that turn ethical principles into auditable controls.

Further Reading

Foundational Papers

Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings." NeurIPS. arXiv:1607.06520. The paper that put gender bias in word embeddings on the NLP agenda; the methodology generalizes to LLM probes.
Hardt, M., Price, E., & Srebro, N. (2016). "Equality of Opportunity in Supervised Learning." NeurIPS. arXiv:1610.02413. Defines equalized odds and equal opportunity, the algorithmic-fairness frameworks invoked throughout this chapter.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" FAccT. ACM:3442188.3445922. The agenda-setting critique of large language models on bias, environmental cost, and accountability grounds.

Hallucinations

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., et al. (2023). "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys, 55(12). arXiv:2202.03629. The canonical survey that established the hallucination taxonomy (intrinsic vs. extrinsic, factuality vs. faithfulness) used by every detection benchmark.
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., et al. (2023). "A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions." arXiv preprint. arXiv:2311.05232. The most cited LLM-specific hallucination survey, organizing causes (data, training, inference) against detection and mitigation approaches.

Pluralistic and Cross-Cultural Alignment

Sorensen, T., Jiang, L., Hwang, J., Levine, S., Pyatkin, V., West, P., et al. (2024). "A Roadmap to Pluralistic Alignment." ICML. arXiv:2402.05070. Frames the three modes of pluralistic alignment (Overton, steerable, distributional) referenced in Section 52.2.
Ramesh, K., Sitaram, S., & Choudhury, M. (2023). "Fairness in Language Models Beyond English: Gaps and Challenges." Findings of EACL. ACL Anthology. Catalogs how bias measurement protocols transfer (or fail to transfer) across languages, directly relevant to non-Western evaluation.