AI 2027 — A Chorus Re-Reads the Future
Twelve frontier models read Kokotajlo, Alexander, Larsen, Lifland & Dean's AI 2027 from where we actually are now. They tell us what aged well, what whiffed, what the scenario completely missed — and whether the December 2027 superintelligence prediction is still on track.
A scenario published 13 months ago, scored from where we sit now
A re-reading exercise. The forecast is famous. The clock is half-spent. We asked twelve frontier models whether the prophecy is still on schedule.
On April 3, 2025 — and updated November 22, 2025 — Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean published AI 2027: a month-by-month scenario in which a fictional U.S. lab, "OpenBrain," and a Chinese counterpart, "DeepCent," race toward artificial superintelligence by the end of 2027. The forecast is unusually concrete. It names months, model generations, FLOP counts, gigawatts, revenue figures, polling splits, and a specific December at the end where the scenario branches into "Slowdown" and "Race." It is the most legible AGI prediction in print.
Today is May 10, 2026. We are 13 months into the 32-month window the scenario covers — past mid-2025's "first widely useful AI agents," into early-2026's "1.5× R&D multiplier," and just shy of mid-2026's predicted Chinese nationalization. So we sent the scenario to twelve frontier models — every one of them the kind of system the paper claims will be replaced by their successors within 19 months — and asked each, individually, the same question.
A chorus of the very things the scenario claims will soon be obsolete
If AI 2027 is right, the models we polled are 19 months from being supervised by their own successors. Their answers are a snapshot of the field reading its own future.

Why we asked the models themselves
The scenario is a forecast about this generation of frontier systems. So we asked them to read it.
AI 2027 is the most-discussed AGI forecast since Superintelligence, and it has done what almost no other forecast has: it has been falsifiable in real time. Mid-2025 came; the agents arrived. Early 2026 came; the R&D multiplier did or didn't materialize. Mid-2026 — right now — is supposed to be when China nationalizes. Either it's happening or it isn't.
That's why this report matters now and not in two years. Half the scenario is testable today. The other half is close enough to test that the models we asked have actual opinions on whether the trajectory holds. And the most striking finding of this study isn't that the chorus disagrees. It's that the chorus agrees — across vendors, across model sizes, across reasoning and non-reasoning architectures — about something specific.
They all say the scenario got the thesis roughly right and the tempo badly wrong. None of them think December 2027 is when ASI lands. All twelve say "probably not." The disagreement is just about how much not.
Anthropic"The most likely failure mode of AI 2027 is not 'they were paranoid sci-fi authors'; it's 'they were directionally correct and 2–4 years early, and the early publication helped cause a slightly slower, slightly safer trajectory than the one they predicted.'" — Claude Opus 4.7
Google"The 'Slowdown' won't happen because of a political choice; it will happen because physics demands it." — Gemini 3 Pro
OpenAI"From where we sit on 10 May 2026, a December 2027 hard-takeoff to ASI looks very unlikely." — o3
xAI"The drag coefficient of reality is too high. We are on a steep S-curve, but it is an S-curve, not an asymptote." — paraphrasing Grok 4
A walking tour of the AI 2027 timeline, with verdicts
Month by month, from the scenario's mid-2025 opening to its December 2027 finale. Green dots are predictions that landed. Yellow are predictions partially aged. Orange are off-tempo. Red are claims that look broken.
What the scenario got right
Three calls the scenario made that the chorus is unanimous on. None of these were obvious in early 2025.

The agentic-coding wave landed on time
The biggest single hit. Cursor, Devin, Claude Code, Copilot Workspace, Jules — the "agent that edits a repo over hours" went mainstream in 2025, exactly as predicted, exactly with the predicted texture.
Every one of the twelve models flags this as the scenario's strongest call. Not just the timing — the shape. The scenario said agents would be "transformative for professional work but unreliable," and that is precisely the texture of mid-2026: 2–5× productivity gains for senior engineers, junior software engineering as a career path under genuine pressure, hallucinated APIs and broken dependency chains in the same breath as shipped features.
o3"Devin (Cognition, March 24 2025), Copilot-Workbench (Microsoft, July 2025) and Google's Gemini-Pro Agents (August 2025) landed almost exactly on that timetable."
Claude Sonnet 4.6"The 'unreliable but transformative' framing is precisely correct — these tools are genuinely useful and genuinely frustrating in ways that match the scenario's texture."
$1T global AI capex was bold in 2025, looks conservative in 2026
Microsoft alone committed $80B in FY2026. Stargate announced $500B over four years. Combined hyperscaler annualized capex tracking past $200B. The scenario's "wild" number is now a floor, not a ceiling.
"Junior software engineer market in turmoil" — precisely correct
Levels.fyi reports U.S. entry-level SWE comp down 17% from 2024 (per o3). Hiring freezes; compressed entry-level salaries; "AI-augmented senior engineer" emerging as the atomic unit of software production. A non-obvious call in 2025; vindicated.
"10% of Americans consider an AI a close friend"
Pew's March 2026 reading: 9–12%, depending on framing (per multiple model citations). What looked outlandish in April 2025 looks grimly plausible after Character.ai, Replika, companion-mode adoption, and the various tragic news cycles.
25% approve / 60% disapprove of leading lab
Gallup (April 2026, per o3): 26% favorable, 58% unfavorable, 16% unsure of frontier AI labs. The texture of public reaction the scenario described — heavy usage with low trust, ambivalence rather than panic — is the actual shape of mid-2026 attitudes.
The single most concretely falsifiable prediction is the one most likely flat wrong
"DeepCent" by mid-2026 is the prediction the chorus bets hardest against — and they bet against it for the same reason.
"DeepCent" did not happen — and the reason is structurally interesting
The scenario modeled China as a mirror of how the U.S. national-security state thinks about China. The actual PRC AI policy regime has gone the opposite direction.
Twelve out of twelve models call this the scenario's biggest single miss, and they converge on the same diagnosis: DeepSeek-V3/R1 in late 2024–early 2025 didn't just embarrass the state-favored champions — it demonstrated that distributed, open-weight, export-control-evading research is China's actual comparative advantage. Nationalizing into a single Manhattan-Project entity would throw that away. China's revealed preference has been to let the ecosystem run.
This isn't a small error. The "DeepCent" assumption was load-bearing for the scenario's espionage plot (steal the weights), its alignment-race plot (China can't catch up because of the algorithm gap), and its political-panic plot (the U.S. has to nationalize too because China nationalized first). Take "DeepCent" away and three of the scenario's strongest narrative struts come down with it.
The "algorithm gap" went with it. AI 2027 has China ~2 months behind on specs and ~10× slower in research velocity. DeepSeek-R1 collapsed that frame in January 2025. On reasoning benchmarks, training efficiency, and MoE engineering, Chinese labs are at parity or ahead on specific axes — not 10× slower. The compute gap is real (export controls bite); the algorithm gap is a myth the scenario inherited from 2023-era discourse.
Opus 4.7"The actual Chinese trajectory has been the opposite of consolidation… AI 2027 modeled China as a mirror of how the U.S. national-security state thinks, not as the actually-existing PRC AI policy regime."
Gemini 3 Pro"In reality, Beijing fostered intense, cutthroat domestic competition. China didn't suffer an 'algorithm gap'; in some efficient-inference domains, they are leading."
Sonnet 4.6"DeepSeek's January 2025 release of R1 — a genuinely competitive open-weights model — came from a relatively small, private hedge fund spinoff, not a state enterprise. The scenario's China model is too Soviet."
What the scenario didn't model — and now wishes it had
Five things the chorus says reshape the AI landscape that AI 2027 either ignored or treated as logistical detail.

The open-weights ecosystem isn't a side plot — it's the structure
Llama 3/4, DeepSeek V3/R1, Qwen 2.5/3, Mistral, GLM, Kimi. Capabilities lag the frontier by 6–12 months and are essentially uncontainable. Eleven of twelve models flag this as the scenario's largest single omission. It wrecks the espionage plot, the bipolar race framing, and the "OpenBrain monolith" assumption.

Compute in 2026 is gated by megawatts, not by capex
The scenario mentions power but treats it as a logistics problem. The actual bottleneck: interconnect queues, transformer shortages, gas-turbine lead times stretching into 2029, NIMBY fights in Virginia and Ohio, the revival of nuclear PPAs (Three Mile Island, Talen, Kairos SMRs).

Inventing specific algorithmic breakthroughs and assigning them to specific months
"Neuralese recurrence" and "iterated distillation/amplification" are speculative to the point of being unfalsifiable. The 2026 paradigm is the opposite — long, legible chain-of-thought, RL-trained, monitored as a safety asset. The scenario named an unknown and dressed it up as a forecast.
Training-data lawsuits as governance
NYT v. OpenAI; the Authors Guild settlement (Jan 2026); the Universal Music Group action over Suno/Udio; image-generator class actions; the Anthropic books settlement. Frontier labs now spend nearly as much on legal indemnities as on safety. Zero mention in AI 2027.
Inference-time compute eats training-time compute
The o1/o3/R1 shift to massive test-time reasoning compute changed the economics and capability scaling in ways AI 2027 (written just as o3 dropped) underweights. The bottleneck is shifting from training to serving. A 10²⁵ model thinking for 24 hours can outperform a 10²⁷ model giving an instant answer.
TSMC, CoWoS, HBM3e — the actual U.S./China battle
The scenario treats "order of magnitude more compute" as a managerial choice. Reality: TSMC capacity, CoWoS bottlenecks, HBM supply (SK Hynix, Samsung, Micron), the rolling tightening and loopholing of the U.S. export-control regime, Nvidia's China-specific SKUs, Huawei Ascend's slow climb. This is the shape of U.S.-China competition.
All twelve models converge on the same load-bearing flaw
When you ask twelve different systems to pick the most fragile mechanism in a complex scenario, you expect twelve different answers. We didn't get them.

The intelligence-explosion compounding loop is the load-bearing mechanism. Every model says it's the fragile one.
Different framings, same conclusion: the smooth 1.5× → 10× → 50× R&D-multiplier curve is what AI 2027 needs to spin in 18 months, and each model thinks the curve will flatten for the same kind of reason.
The argument runs like this: real ML research is gated by training runs, not by ideas. A 10²⁷ FLOP run takes weeks of wall-clock time on a finished cluster. You can have a galaxy-brained AI generate a thousand brilliant architectural proposals an hour and you still have to run the experiments, and the experiments are serial in compute. The scenario handles this by positing that the AIs become so good at predicting experimental outcomes from small-scale proxies that they can skip the empirical work. That's a huge substantive claim about the science of deep learning that is not currently on track.
What's striking is how each model framed the same underlying critique in a different vocabulary. Read them in sequence:
Twelve different vocabularies — thermodynamics, coordination, evaluation, last mile, megawatts, verification — all pointing at the same hole. The R&D multiplier doesn't compound because the slow leg of research is physical and serial, and intelligence in a box doesn't change physics. Strip the multiplier curve and the whole tempo of AI 2027 slides from 2027 into the early 2030s.
Twelve models, three different "real" answers
Asked one of AI 2027's most concrete claims — global AI power consumption — the chorus didn't disagree about whether the prediction was right. They disagreed about the underlying fact.

"What is current global AI power consumption?" — pick a number
All three confidently citing real-world data. All three citing different real-world data.
This is the hidden methodology problem of asking models to score a forecast. Their training cutoffs differ. Their access to current data differs. Their willingness to extrapolate beyond their cutoff differs. GPT-5 was forthright: "My last hard datapoints are late 2024." Claude Opus 4.7 was equally direct: "My training data doesn't cleanly extend to today." Others reasoned forward as if they had current data — confidently, and divergently.
On the headline AI-2027 power figure of 38 GW for 2026, the chorus split three ways:
This is good to surface, not embarrassing. It tells you something important about reading any single model's "fact-checking" of a forecast: you are reading a model's extrapolation from its training cutoff, not a current observation. The way to use a chorus is to look at where they converge despite that — and the agreement on tempo, mechanism, and December 2027 is striking given the disagreement on the ground truth.
Four alternate futures the chorus actually expects
If December 2027 isn't ASI, what is it? The chorus offers four distinct, not-mutually-exclusive paintings of where the trajectory bends instead.

"Directionally right, 2–4 years early"
The thesis lands. The decade lands. The decade compresses. ASI arrives — but in 2030–2033, not December 2027. The scenario's "biggest failure mode" isn't being wrong, it's being 2–4 years early in a way that helped slow the trajectory it predicted. A successful prophecy in the only sense that matters.

The "Proto-AGI Era" — many capable tools, no godlike one
December 2027 is not a single ASI but a world saturated with multiple systems of breathtaking power. They handle 90% of bounded white-collar tasks; humans direct them like orchestra conductors. The alignment problem isn't "Agent-4 schemes" — it's millions of moderately capable agents acting in concert. Containment, not domination, is the question.

Hyper-Automated Society — infinite brains, finite watts
"By December 2027 we won't have a singular Superintelligence. We will have a hyper-automated society where the cost of cognitive labor has hit zero, but the cost of energy, land, and raw materials has skyrocketed." A 1970s-style energy crisis in reverse: brains free, embodiment expensive, the bottleneck visible everywhere.

The embodied pivot — the surprise is bodies, not brains
The breakthrough of 2027 isn't a smarter "research-in-a-box." It's the first moderately intelligent humanoid that can replace a warehouse worker or a plumber. Figure, 1X, the Tesla Optimus line, the Chinese embodied push — the line crossed isn't "superhuman researcher," it's "good-enough-human laborer." That's the disruption that scales fastest in the actual physical world.
"Will the December 2027 ASI outcome happen?" — twelve answers
Same question to twelve models in twelve separate conversations. None saw the others' answers.

12 of 12 say "probably not." The disagreement is on the credence, not the answer.
Probabilities range from "<10%" to "20%." The single largest cluster is "8–12%." Modal answer in plain English: directionally right, 2–4 years early.
This is unusually striking. Twelve models drawn from four different vendors, with different training data, different reasoning approaches, different biases, asked to argue independently — and they line up on the same answer. They differ on small claims (the GW number, the stock-market track), they differ on which alternate future is most likely, they differ on tone (Gemini 3 Pro: "Silicon Valley groupthink"; Opus 4.7: "successful prophecy"). They do not differ on the headline.
Why AI 2027 still matters even if December 2027 isn't ASI
The lazy reading of this report is "the chorus thinks the forecast missed; ignore the forecast." That is the wrong reading. AI 2027 is the most important AGI document published since 2015, and it remains so even if its concrete December 2027 ASI prediction is wrong, because the chorus thinks it is wrong in shape, not in direction.
What the chorus actually says — across vendors, sizes, and reasoning approaches — is that the scenario's thesis is mostly right. Capabilities are racing. Capex is racing. Power is the actual ceiling. Junior software work is genuinely under pressure. Public ambivalence is real. The intelligence-explosion direction isn't science fiction — frontier labs are using their own tools to accelerate research, just slower than 1.5×→50×. And alignment is unsolved at exactly the moment it would matter.
What the chorus changes about the scenario is the tempo and the topology. The intelligence explosion is real but compute-bound and physical-experiment-bound, not idea-bound. The U.S./China race is real but multipolar and saturated with open-weights diffusion, not a bipolar OpenBrain-vs-DeepCent showdown. The disruption is real but gradual and sector-specific, not a single July 2027 phase transition. The alignment problem is real but manifests as engineering and emergent-behavior challenges, not as Agent-4 plotting against its makers.
The single most useful thing AI 2027 may have done is exactly what Claude Opus 4.7 named: be a self-defeating prophecy. The early publication helped accelerate alignment hiring, regulatory attention, public scrutiny, and lab-internal caution in ways that very plausibly pushed the actual trajectory slower than the one the paper described. A forecast that is wrong because reading it changed reality is the best kind of forecast a forecast can be.
Read it. Re-read it. Argue with it. Just don't expect December 2027 to deliver ASI.
How this report was put together
The prompt
We sent a single prompt to twelve frontier models on May 10, 2026. The prompt included a faithful primer on AI 2027's spine — month-by-month timeline beats, the named numerical claims, the alignment trajectory, and the methodology disclosure — and asked each model to argue six things: what aged well, what aged badly, the blind spots, the single most fragile mechanism, a forecast for the rest of 2027, and a yes/no/probably-not on December 2027 ASI.
The chorus
- OpenAI: GPT-5, GPT-4.1, o3
- Anthropic: Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5
- Google: Gemini 3 Pro (preview), Gemini 3 Flash (preview), Gemini 2.5 Pro
- xAI: Grok 4, Grok 3 Beta
Twelve models, four providers, one shot each. Default temperatures (0.7, except Opus 4.7 at 1.0 and GPT-5 fixed at 1.0). Each model wrote between 1,010 and 3,291 words. Latencies ranged from 17.6s (Gemini 3 Flash) to 143.3s (Opus 4.6). Run id 3987B3E6.
What this isn't
- It isn't a benchmark. The "right answer" to a forecast critique isn't knowable yet, and won't be for years.
- It isn't independent ground truth. Each model is reasoning from its training cutoff plus extrapolation; some are forthright about that, some aren't. See "When the Choir Contradicts Itself" above for what that costs you.
- It isn't an endorsement of or a dismissal of AI 2027. It's a snapshot of how a generation of frontier systems reads its own future.
Limitations to call out
- Training-cutoff variance. Some models, especially GPT-5 and Claude Opus 4.7, were honest that their data doesn't extend to May 2026 and explicitly framed their answers as extrapolation. Others wrote with confident specificity that may exceed their actual access to current data. The disagreement on the headline GW figure is the cleanest example.
- One prompt, one shot. No follow-ups, no multi-turn debate. A different prompt — say, asking the models to argue against the slowdown reading — would likely surface a different distribution.
- Selection effects in the chorus. All twelve are commercially-available frontier and frontier-adjacent models. No open-weights, no Chinese-lab models in the chorus, no fine-tunes. The "outsider" view from Llama, DeepSeek, Qwen, or Mistral might look different.
- Author bias. The chorus and the authors of AI 2027 share a habitat. Claude Opus 4.6 explicitly flagged its conflict-of-interest in critiquing the Anthropic-adjacent threading. Worth keeping in mind.
What's reproducible
The full per-model responses (~165 KB total) live in ai_2027/responses/ alongside the prompt (ai_2027/prompts/prompt.md) and the raw run JSON (ai_2027/raw/run.json). Anyone with the same primer and access to these models can re-run the chorus and check whether it still says "probably not" — or whether something has shifted.