Section 77.3: AGI Timelines: The 2027-2033 Spectrum

"AGI timelines are confidence intervals that change with the next benchmark. The honest answer is a range and an underline."
Frontier, Timeline-Honest AI Agent

Note: Learning Objectives

Position the major 2026 timeline forecasters (Amodei, Hassabis, Metaculus, Polymarket, 80,000 Hours) on a common axis.
Recognize the "definition shopping" trap that makes AGI-year forecasts look more disagreeable than the underlying capability forecasts are.
Identify three empirical indicators that will arbitrate the 2027-2033 spectrum.
Avoid conflating "capability frontier" with "deployment frontier" when reading timeline arguments.

Big Picture

This is the question every LLM textbook eventually has to address. The honest answer is that serious forecasters disagree by a factor of two on the year. Rather than pretend a single number captures the field's uncertainty, this section walks the spread, names the indicators that will narrow it, and locates the definitional drift that makes the question harder than it has to be.

Prerequisites

This section assumes the frontier benchmark vocabulary from Section 77.1, the LLM scaling-law intuition from Section 6.3, and the alignment framing from Section 77.2.

The single most-debated question in this whole part is when (or whether) AGI (Artificial General Intelligence, an AI that matches or exceeds human capability across essentially every cognitive task, not just narrow benchmarks) arrives, and the answers differ by years. Anthropic's Dario Amodei has publicly anchored 2026-27 as plausible for "powerful AI" capable of major scientific breakthroughs. Google DeepMind's Demis Hassabis has remained at "around 2030". Metaculus's median forecast in May 2026 compressed to March 30, 2028, with a 25-50% interval of 2029 to 2033. Polymarket gave only 9% probability to 2027.

This section walks the spectrum, not because anyone has a calibrated answer but because seeing the spread is more useful than pretending a single number captures the field's actual uncertainty. The most honest framing is: the timeline depends as much on how you define "AGI" as on how the technology moves.

77.3.1 The compressed timeline (2026-2028)

Fun Fact

"AGI in 5 years" has been a confident industry prediction since at least 1956, which was the year of the Dartmouth Workshop and the most popular five-year window ever issued. Dario Amodei's 2024 "Machines of Loving Grace" essay was unusually careful to frame Anthropic's compressed-timeline view as conditional rather than confident, which is now the most cited hedge in any frontier-lab strategy document. The honest summary of all AGI timelines is that the field's median estimate moves about one year forward for every year that passes, until it doesn't.

The bullish case rests on three observations. First, if one extrapolates the HLE curve linearly (HLE is "Humanity's Last Exam", a 2024 benchmark of ~3,000 expert-written PhD-level questions across math, science, and humanities; Section 77.1 introduces it), human-expert parity is reached within fifteen months; this is a naive projection that assumes neither saturation nor a regime change in the next two scaling cycles, and benchmark curves have flattened before. Second, the agentic coding benchmarks (SWE-bench Verified, a 500-task subset of real GitHub issues that an AI must fix end-to-end) have crossed 70% with Claude Opus 4.6 and are still climbing. Third, test-time compute / reasoning models (o3, Claude Opus 4.6, GPT-5-Reasoning) have not yet hit a plateau. Amodei's "Machines of Loving Grace" essay (October 2024) is the canonical text for this position.

77.3.2 The mainstream timeline (2028-2032)

The Metaculus median sits here. The argument: progress is real but capability gaps (long-horizon agentic tasks, novel mathematical discovery, robust common-sense reasoning) consistently take longer to close than predicted. Hassabis's stance, the Stanford HAI 2026 predictions, and the median Metaculus forecaster all fall in this window. The bias-correction argument: people have been forecasting AGI "in 5-10 years" for 60 years; mainstream timelines mostly arrive at "5-10 years" again.

77.3.3 The skeptical timeline (post-2033)

The skeptical case rests on the observation that benchmarks measure narrow capabilities and that "general" intelligence requires capabilities that have proved resistant: true autonomous research, robust long-horizon planning, transfer to novel domains without examples. A LessWrong visualization of changing AGI timelines tracks where individual forecasters have moved; almost none have moved later, but several mainstream ones have remained at "2030s, plural". Polymarket's 9% on 2027 reflects this skepticism.

77.3.4 Comparing the timeline positions

Table 77.3.1: Mainstream AGI timeline positions, May 2026.

Source	Year median	25-75% interval	Position
Dario Amodei (Anthropic)	2026-27	by 2028	Bullish
Demis Hassabis (DeepMind)	2030	2028-2033	Mainstream
Metaculus median	March 2028	2029-2033	Mainstream
Polymarket (2027)	~9% to 2027	n/a	Skeptical
80,000 Hours synthesis	2028-30	2027-2035	Mainstream

Horizontal timeline from 2026 through 2035 showing five AGI-timeline positions a — **Figure 77.3.1a**: Five AGI-timeline positions plotted on a common axis. The 25-75% intervals overlap, but the medians span 2026 to 2030, a factor-of-two disagreement among serious forecasters. s bars. Amodei's bullish range sits in 2026-2028 (red). Hassabis's mainstream window spans 2028-2033 with median at 2030 (green). Metaculus median is March 2028 with 25-75% interval 2029-2033. Polymarket gives only 9% probability to 2027, plotted as a faded skeptical-blue bar. 80,000 Hours synthesis spans 2027-2035 with median 2028-30.

Key Insight: Definition asymmetry dominates the spread

The 2027-2033 range is wide because "AGI" is not a single threshold but a basket. Pass a Turing test (already done, depending on rules); match a human PhD on HLE (44.7% now, ~80% needed); autonomously conduct a novel scientific discovery (no test yet); operate as a fully replacing remote-worker for any white-collar role (partial today, partial in five years). Different definitions produce different timelines. Demanding a precise year is asking the wrong question; identifying which capabilities matter for your problem and tracking those specifically is the right one.

Warning: definition shopping is rampant

The 2026 forecasting space has a structural incentive to redefine AGI to match a forecast. Labs that benefit from "AGI is near" timelines (Anthropic, OpenAI) anchor on capability-benchmark thresholds (HLE 80%, ARC-AGI-2 90%). Labs that benefit from "AGI is far" timelines (DeepMind partly) anchor on the broader basket (autonomous research, robust transfer). Independent forecasters (Metaculus, Polymarket) mostly anchor on whichever public definition their resolution criterion uses, which differs by market. When you read a 2026 AGI-year forecast, the first question to ask is "by whose definition", not "what year". Metaculus's question has been re-edited four times since 2020 and the resolution criterion still under-determines what counts.

Warning: don't conflate "capability" with "deployment"

Even after a system matches PhD-level expertise on benchmarks, deploying it broadly into the economy takes years. Anthropic's labor-market study found 35.9% of U.S. workers used generative AI by Dec 2025, but only 5% of the 1.17M layoffs in 2025 were attributed to AI directly. The capability frontier and the deployment frontier are not the same curve; both matter for the practical effects, but the second moves much slower. Section 77.4 examines this gap.

Tip: track three indicators, not the AGI question

The three indicators most likely to settle the 2027-2033 debate empirically are: (1) the saturation rate of HLE and ARC-AGI-2/3, (2) the percentage of SWE-bench Verified problems that close-ended agents solve unsupervised, (3) the share of AI-economy interactions that Anthropic's labor-market data flips from augmentation to automation. If all three move quickly, the compressed timeline is right; if only one moves quickly, the spread persists. Watch the data, not the predictions.

77.3.5 What this section claims and disclaims

This section does not pick a year. The point is to expose the spread: serious people who have spent careers on this question disagree by a factor of two on the timeline. A textbook claiming a single year would be wrong. The honest claim is that the 2026-2028 indicators are unusually rich and the 2027 question is the closest we have ever come to having a year where benchmarks and labor data could decisively answer "is this still on a clear curve?". Section 77.4 turns to the economic side of that answer.

Key Takeaways

Mainstream AGI-year forecasts span 2026 to 2033, a factor-of-two disagreement among serious forecasters.
The 25-75% intervals overlap; the medians do not. Definition asymmetry explains most of the spread.
Track three indicators: HLE/ARC-AGI-2 saturation rate, SWE-bench Verified unsupervised solve rate, augmentation-to-automation flip in labor data.
Capability and deployment frontiers move at different speeds; both matter for practical effects.

Self-Check

Q1: Why does Polymarket give only 9% probability to 2027 AGI when Amodei publicly anchors 2026-27?

Show Answer

Polymarket aggregates skin-in-the-game bets from a broad user base, which structurally discounts the most optimistic single-lab voices and pulls the median toward conservative outcomes. The market's resolution criterion is also stricter than Amodei's working definition: it typically requires demonstrably general capabilities on a fixed external test rather than a lab-internal milestone. Combined with the base-rate drag of forecasted technology timelines slipping, those two factors are enough to drop the implied probability from "near" to roughly 9%.

Q2: A reader asks "is AGI here yet?" in May 2026. What is the right textbook answer?

Show Answer

The honest answer is that it depends entirely on the definition. By the "passes a Turing test" criterion, yes, frontier systems have cleared that bar for years. By the "matches human PhD experts on HLE" criterion, no, current scores sit around 44.7% versus roughly 80% needed. By the "autonomously conducts novel scientific discovery" criterion, no measured instances yet exist. The textbook position is therefore to refuse a single yes-no answer and instead point readers to which capability bundle their use case actually depends on.

What's Next?

In the next section, Section 77.4: Economic Implications & Labor-Market Data, we build on the material covered here.

Further Reading

Metaculus question 5121: "When will the first general AI system be devised, tested, and publicly announced?" (median forecast).

Amodei, "Machines of Loving Grace" (Anthropic, Oct 2024).

80,000 Hours, "What the Hell Happened with AGI Timelines?" (2025 synthesis).

Cotra, "Biological Anchors" (Open Philanthropy, ongoing).

Davidson, "Compute-Centric Takeoff" (Open Philanthropy, 2024-25).

Stanford HAI, "2026 Predictions".

LessWrong, "A Visualization of Changing AGI Timelines, 2023-2026".