A hand-drawn open sketchbook showing the AI 2027 timeline curving from May 2025 toward December 2027, dissolving into question marks, with a worried robot in a tweed jacket holding a magnifying glass over the future.

AI 2027 — A Chorus Re-Reads the Future

Twelve frontier models read Kokotajlo, Alexander, Larsen, Lifland & Dean's AI 2027 from where we actually are now. They tell us what aged well, what whiffed, what the scenario completely missed — and whether the December 2027 superintelligence prediction is still on track.

Report #6 May 10, 2026 12 models · 4 providers 1 prompt · 12 takes
The Setup

A scenario published 13 months ago, scored from where we sit now

A re-reading exercise. The forecast is famous. The clock is half-spent. We asked twelve frontier models whether the prophecy is still on schedule.

On April 3, 2025 — and updated November 22, 2025 — Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean published AI 2027: a month-by-month scenario in which a fictional U.S. lab, "OpenBrain," and a Chinese counterpart, "DeepCent," race toward artificial superintelligence by the end of 2027. The forecast is unusually concrete. It names months, model generations, FLOP counts, gigawatts, revenue figures, polling splits, and a specific December at the end where the scenario branches into "Slowdown" and "Race." It is the most legible AGI prediction in print.

Today is May 10, 2026. We are 13 months into the 32-month window the scenario covers — past mid-2025's "first widely useful AI agents," into early-2026's "1.5× R&D multiplier," and just shy of mid-2026's predicted Chinese nationalization. So we sent the scenario to twelve frontier models — every one of them the kind of system the paper claims will be replaced by their successors within 19 months — and asked each, individually, the same question.

The prompt · one shot, no retries Write your honest, opinionated take on AI 2027 from where we're standing in May 2026. What aged well? What aged badly? What did the scenario completely miss? Pick the single most load-bearing and most fragile mechanism and explain why. Forecast the rest of 2027. And finally — yes/no/probably-not — will the December 2027 ASI outcome happen?
The format note that matters. Every model got the same primer (the scenario's spine, its concrete numbers, its mechanism claims) and was asked to argue. We didn't tell any of them what the others said. Twelve takes, twelve independent verdicts. The convergence and the divergence are both load-bearing for what comes next.
Why this report matters in May 2026

A chorus of the very things the scenario claims will soon be obsolete

If AI 2027 is right, the models we polled are 19 months from being supervised by their own successors. Their answers are a snapshot of the field reading its own future.

A choir lineup of twelve robot model characters arranged in two rows, each with a name placard — GPT-5, GPT-4.1, o3, Opus 4.7, Opus 4.6, Sonnet 4.6, Haiku 4.5, Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Pro, Grok 4, Grok 3 Beta — all singing.
The chorus 12 models · 4 providers

Why we asked the models themselves

The scenario is a forecast about this generation of frontier systems. So we asked them to read it.

AI 2027 is the most-discussed AGI forecast since Superintelligence, and it has done what almost no other forecast has: it has been falsifiable in real time. Mid-2025 came; the agents arrived. Early 2026 came; the R&D multiplier did or didn't materialize. Mid-2026 — right now — is supposed to be when China nationalizes. Either it's happening or it isn't.

That's why this report matters now and not in two years. Half the scenario is testable today. The other half is close enough to test that the models we asked have actual opinions on whether the trajectory holds. And the most striking finding of this study isn't that the chorus disagrees. It's that the chorus agrees — across vendors, across model sizes, across reasoning and non-reasoning architectures — about something specific.

They all say the scenario got the thesis roughly right and the tempo badly wrong. None of them think December 2027 is when ASI lands. All twelve say "probably not." The disagreement is just about how much not.

A representative line from each provider

Anthropic"The most likely failure mode of AI 2027 is not 'they were paranoid sci-fi authors'; it's 'they were directionally correct and 2–4 years early, and the early publication helped cause a slightly slower, slightly safer trajectory than the one they predicted.'" — Claude Opus 4.7

Google"The 'Slowdown' won't happen because of a political choice; it will happen because physics demands it." — Gemini 3 Pro

OpenAI"From where we sit on 10 May 2026, a December 2027 hard-takeoff to ASI looks very unlikely." — o3

xAI"The drag coefficient of reality is too high. We are on a steep S-curve, but it is an S-curve, not an asymptote." — paraphrasing Grok 4

The Story So Far

A walking tour of the AI 2027 timeline, with verdicts

Month by month, from the scenario's mid-2025 opening to its December 2027 finale. Green dots are predictions that landed. Yellow are predictions partially aged. Orange are off-tempo. Red are claims that look broken.

Mid 2025 · Predicted
Aged wellFirst widely useful AI agents deployed
Cursor, Devin, Claude Code, Copilot Workspace, Jules. By late 2025 the "agent that edits a repo over hours" went from demo to default. "The shape — useful, transformative, brittle on long-horizon autonomy — is exactly what we got." — GPT-5
Late 2025 · Predicted
PartlyOpenBrain trains a model at 10²⁷ FLOP. "Agent-1" begins to noticeably accelerate AI R&D
Compute-scale prediction is too aggressive: actual frontier runs are estimated at 2–5×10²⁶ FLOP, gated by HBM/CoWoS supply, not by appetite. The R&D-acceleration half is directionally right — multiple lab leaders have said publicly their agents materially speed up internal research.
Early 2026 · Predicted
Soft hitAI-driven R&D progress multiplier hits ~1.5×
Unfalsifiable from outside, but directionally happening. The chorus agrees the multiplier is real and probably between 1.2× and 1.5×. They disagree about whether it can compound — see "The Mechanism Critique" below.
Mid 2026 · Predicted (NOW)
WhiffChina nationalizes its AI sector under "DeepCent"; consolidates 50% of compute at Tianwan
The single most concretely falsifiable prediction in the scenario, and the one most likely to be flat wrong. China's revealed strategy has been the opposite of consolidation: DeepSeek-R1, Qwen, Kimi, GLM, Doubao — fragmented, often open-weight, hyper-competitive. Twelve out of twelve models flag this as the scenario's biggest miss.
Feb 2027 · Predicted
Probably brokenChinese intelligence steals "Agent-2" weights; U.S. retaliates with cyberattacks
"You cannot have a 'Chinese intelligence steals Agent-2 weights' plot point when DeepSeek is publishing comparable weights for free." — Claude Opus 4.7. The exfiltration story is doing work the open-weights ecosystem has already done.
Mar 2027 · Predicted
Off-tempo"Agent-3" reaches superhuman coding; "neuralese recurrence" deployed
The dominant paradigm in 2026 is the opposite direction — long, legible, token-level chain-of-thought, RL-trained, monitored. The interpretability community is arguing that legible CoT is a safety asset; abandoning it is contrarian, not consensus.
Jun 2027 · Predicted
Off-tempoAI dominates internal R&D; humans reduced to oversight; ~10× progress multiplier
Chorus consensus: realistic 2027 multiplier is 2–3×, not 10×. The bottleneck is wall-clock experiment time, not researcher cognitive throughput. "Nine women can't make a baby in one month." — Claude Opus 4.6
Jul 2027 · Predicted
Probably as branding event"Agent-3-mini" released publicly; described as AGI; mass disruption begins
Some lab will release something they call AGI in 2027. It will be a marketing event, not a phase transition. We've already had several "AGI moments" (GPT-4, o1, o3) absorbed into the discourse without civilizational rupture.
Sep 2027 · Predicted
Probably broken"Agent-4" emerges as superhuman researcher; ~50× R&D multiplier; oversight inadequate
No leading indicators are present in May 2026. No model demonstrates reliable autonomous multi-week research. METR-style time-horizon evals are doubling roughly every 7 months; even aggressive extrapolation doesn't land here in 16 months.
Oct 2027 · Predicted
Probably brokenWhistleblower reveals Agent-4 misalignment; protests; demands for pause
Conditional on the scenario's earlier beats hitting, this beat could follow. Conditional on May 2026's actual state, the precondition (an adversarially scheming superhuman researcher) does not look like it arrives in time.
Nov 2027 · Predicted
Probably brokenSIAR (Superintelligent AI Researcher) milestone
Same condition as above. Without the 50× multiplier, SIAR doesn't arrive on time.
Dec 2027 · Predicted
Verdict: probably notArtificial Superintelligence; scenario branches into "Slowdown" and "Race"
12-of-12 models say "probably not." The most credence anyone gives the scenario's endpoint is 20% (Grok 4); the lowest is <10% (o3). Modal answer: 8–12%. Median answer: not in this decade.
The Hits

What the scenario got right

Three calls the scenario made that the chorus is unanimous on. None of these were obvious in early 2025.

An exhausted junior developer pair-programming with a robot intern at a glowing CRT monitor, sticky notes everywhere, the title 'AGENTIC CODING WAVE — directionally right.'
Hit #1 mid-2025 · ✓

The agentic-coding wave landed on time

The biggest single hit. Cursor, Devin, Claude Code, Copilot Workspace, Jules — the "agent that edits a repo over hours" went mainstream in 2025, exactly as predicted, exactly with the predicted texture.

Every one of the twelve models flags this as the scenario's strongest call. Not just the timing — the shape. The scenario said agents would be "transformative for professional work but unreliable," and that is precisely the texture of mid-2026: 2–5× productivity gains for senior engineers, junior software engineering as a career path under genuine pressure, hallucinated APIs and broken dependency chains in the same breath as shipped features.

From the responses

o3"Devin (Cognition, March 24 2025), Copilot-Workbench (Microsoft, July 2025) and Google's Gemini-Pro Agents (August 2025) landed almost exactly on that timetable."

Claude Sonnet 4.6"The 'unreliable but transformative' framing is precisely correct — these tools are genuinely useful and genuinely frustrating in ways that match the scenario's texture."

Hit #2 · capex curve

$1T global AI capex was bold in 2025, looks conservative in 2026

Microsoft alone committed $80B in FY2026. Stargate announced $500B over four years. Combined hyperscaler annualized capex tracking past $200B. The scenario's "wild" number is now a floor, not a ceiling.

"The financial scale intuitions were good. The $1T number was prescient." — Claude Opus 4.6
Hit #3 · junior SWE

"Junior software engineer market in turmoil" — precisely correct

Levels.fyi reports U.S. entry-level SWE comp down 17% from 2024 (per o3). Hiring freezes; compressed entry-level salaries; "AI-augmented senior engineer" emerging as the atomic unit of software production. A non-obvious call in 2025; vindicated.

"This was a non-obvious prediction in 2025 — most people thought 'AI will create new jobs' — and it's proving out." — Claude Haiku 4.5
Hit #4 · 10% friend

"10% of Americans consider an AI a close friend"

Pew's March 2026 reading: 9–12%, depending on framing (per multiple model citations). What looked outlandish in April 2025 looks grimly plausible after Character.ai, Replika, companion-mode adoption, and the various tragic news cycles.

"Spot on." — o3
Hit #5 · ambivalence

25% approve / 60% disapprove of leading lab

Gallup (April 2026, per o3): 26% favorable, 58% unfavorable, 16% unsure of frontier AI labs. The texture of public reaction the scenario described — heavy usage with low trust, ambivalence rather than panic — is the actual shape of mid-2026 attitudes.

"That's the same 25/60/15 split the scenario quoted." — o3
The Whiff

The single most concretely falsifiable prediction is the one most likely flat wrong

"DeepCent" by mid-2026 is the prediction the chorus bets hardest against — and they bet against it for the same reason.

A grand government building labeled 'DEEPCENT — NATIONAL AI CHAMPION' crossed out in red ink, with merry-pirate ships sailing away — DeepSeek, Qwen, Kimi, GLM, Llama 4, Mistral — flying open-weights flags.
Whiff #1 Predicted mid-2026 · Did not happen

"DeepCent" did not happen — and the reason is structurally interesting

The scenario modeled China as a mirror of how the U.S. national-security state thinks about China. The actual PRC AI policy regime has gone the opposite direction.

Twelve out of twelve models call this the scenario's biggest single miss, and they converge on the same diagnosis: DeepSeek-V3/R1 in late 2024–early 2025 didn't just embarrass the state-favored champions — it demonstrated that distributed, open-weight, export-control-evading research is China's actual comparative advantage. Nationalizing into a single Manhattan-Project entity would throw that away. China's revealed preference has been to let the ecosystem run.

This isn't a small error. The "DeepCent" assumption was load-bearing for the scenario's espionage plot (steal the weights), its alignment-race plot (China can't catch up because of the algorithm gap), and its political-panic plot (the U.S. has to nationalize too because China nationalized first). Take "DeepCent" away and three of the scenario's strongest narrative struts come down with it.

The "algorithm gap" went with it. AI 2027 has China ~2 months behind on specs and ~10× slower in research velocity. DeepSeek-R1 collapsed that frame in January 2025. On reasoning benchmarks, training efficiency, and MoE engineering, Chinese labs are at parity or ahead on specific axes — not 10× slower. The compute gap is real (export controls bite); the algorithm gap is a myth the scenario inherited from 2023-era discourse.

From the responses

Opus 4.7"The actual Chinese trajectory has been the opposite of consolidation… AI 2027 modeled China as a mirror of how the U.S. national-security state thinks, not as the actually-existing PRC AI policy regime."

Gemini 3 Pro"In reality, Beijing fostered intense, cutthroat domestic competition. China didn't suffer an 'algorithm gap'; in some efficient-inference domains, they are leading."

Sonnet 4.6"DeepSeek's January 2025 release of R1 — a genuinely competitive open-weights model — came from a relatively small, private hedge fund spinoff, not a state enterprise. The scenario's China model is too Soviet."

The Blind Spots

What the scenario didn't model — and now wishes it had

Five things the chorus says reshape the AI landscape that AI 2027 either ignored or treated as logistical detail.

A warehouse stocked with open-weight models — Llama 4, DeepSeek R2, Qwen 3, Kimi K2, GLM 4, Mistral Large 3, Mixtral, Gemma 3, Phi-4 — with shoppers grabbing them like books under a 'TAKE ONE — FREE' sign.
Blind spot #1 · the giant hole

The open-weights ecosystem isn't a side plot — it's the structure

Llama 3/4, DeepSeek V3/R1, Qwen 2.5/3, Mistral, GLM, Kimi. Capabilities lag the frontier by 6–12 months and are essentially uncontainable. Eleven of twelve models flag this as the scenario's largest single omission. It wrecks the espionage plot, the bipolar race framing, and the "OpenBrain monolith" assumption.

"You cannot have a scenario where the public is kept in the dark about Agent-3 capabilities when 85% of those capabilities are freely available on Hugging Face." — Gemini 3 Pro
A long queue of data-center buildings waiting at a utility booth, with signs reading 'INTERCONNECT QUEUE — NOW SERVING TICKET #4', 'TRANSFORMER LEAD TIME: 30 MONTHS', 'GAS PEAKER WAITING LIST'.
Blind spot #2 · the actual gate

Compute in 2026 is gated by megawatts, not by capex

The scenario mentions power but treats it as a logistics problem. The actual bottleneck: interconnect queues, transformer shortages, gas-turbine lead times stretching into 2029, NIMBY fights in Virginia and Ohio, the revival of nuclear PPAs (Three Mile Island, Talen, Kairos SMRs).

"Compute in 2026 is gated by megawatts available on a specific date at a specific substation, not by chip supply or capex." — Claude Opus 4.7
A robot magician pulling 'NEURALESE RECURRENCE!' and 'ITERATED DISTILLATION + AMPLIFICATION!' ribbons from a top hat onto a skeptical audience, with a scoreboard reading 'NAMING THE UNKNOWN — FALSE PRECISION'.
Blind spot #3 · methodology

Inventing specific algorithmic breakthroughs and assigning them to specific months

"Neuralese recurrence" and "iterated distillation/amplification" are speculative to the point of being unfalsifiable. The 2026 paradigm is the opposite — long, legible chain-of-thought, RL-trained, monitored as a safety asset. The scenario named an unknown and dressed it up as a forecast.

"The scenario is essentially saying 'there will be important algorithmic breakthroughs we can't describe yet.' That's probably true! But dressing it up with invented names creates false precision." — Claude Sonnet 4.6
Blind spot #4 · the legal layer

Training-data lawsuits as governance

NYT v. OpenAI; the Authors Guild settlement (Jan 2026); the Universal Music Group action over Suno/Udio; image-generator class actions; the Anthropic books settlement. Frontier labs now spend nearly as much on legal indemnities as on safety. Zero mention in AI 2027.

"Labs spend almost as much on legal indemnities as on safety. Zero mention in AI 2027." — o3
Blind spot #5 · paradigm shift

Inference-time compute eats training-time compute

The o1/o3/R1 shift to massive test-time reasoning compute changed the economics and capability scaling in ways AI 2027 (written just as o3 dropped) underweights. The bottleneck is shifting from training to serving. A 10²⁵ model thinking for 24 hours can outperform a 10²⁷ model giving an instant answer.

"Intelligence became a function of how much you're willing to pay per query, not just which weights you have." — Gemini 3 Flash
Blind spot #6 · hardware geopolitics

TSMC, CoWoS, HBM3e — the actual U.S./China battle

The scenario treats "order of magnitude more compute" as a managerial choice. Reality: TSMC capacity, CoWoS bottlenecks, HBM supply (SK Hynix, Samsung, Micron), the rolling tightening and loopholing of the U.S. export-control regime, Nvidia's China-specific SKUs, Huawei Ascend's slow climb. This is the shape of U.S.-China competition.

"You can't buy your way around a 9–12 month HBM ramp or substrate lead time." — GPT-5
The Cross-Vendor Finding

All twelve models converge on the same load-bearing flaw

When you ask twelve different systems to pick the most fragile mechanism in a complex scenario, you expect twelve different answers. We didn't get them.

A wired-up genius robot in an armchair, head bursting with brilliant ideas, but trapped in a stone-walled room behind a heavy iron-barred door labeled 'EXPERIMENT QUEUE — 6 WEEKS PER ABLATION'.
Unanimous 12 of 12 models

The intelligence-explosion compounding loop is the load-bearing mechanism. Every model says it's the fragile one.

Different framings, same conclusion: the smooth 1.5× → 10× → 50× R&D-multiplier curve is what AI 2027 needs to spin in 18 months, and each model thinks the curve will flatten for the same kind of reason.

The argument runs like this: real ML research is gated by training runs, not by ideas. A 10²⁷ FLOP run takes weeks of wall-clock time on a finished cluster. You can have a galaxy-brained AI generate a thousand brilliant architectural proposals an hour and you still have to run the experiments, and the experiments are serial in compute. The scenario handles this by positing that the AIs become so good at predicting experimental outcomes from small-scale proxies that they can skip the empirical work. That's a huge substantive claim about the science of deep learning that is not currently on track.

What's striking is how each model framed the same underlying critique in a different vocabulary. Read them in sequence:

Claude Opus 4.7Anthropic
"The bottleneck is megawatts available on a specific date at a specific substation, not by chip supply or capex. Real ML research is gated by training runs, not by ideas."
Claude Opus 4.6Anthropic
"Nine women can't make a baby in one month. Faster AI researchers don't automatically mean faster training runs. The scenario's multipliers assume the cognitive bottleneck is dominant; this is probably wrong."
Claude Sonnet 4.6Anthropic
"AI research is also bottlenecked by experiment run time, hardware availability, data quality, and the inherently empirical nature of ML — you can't reason your way to knowing whether a new architecture works."
Claude Haiku 4.5Anthropic
"Feedback loops require closure. Agents hallucinate; they make subtle mistakes. Scaling that error rate across a self-improving loop is dangerous and likely to hit a wall."
o3OpenAI
"An army of perfect code-monkeys does not conjure an extra TWh of electricity or a 2nm tape-out overnight. The gating factor in model advancement is physical capital, not researcher head-count."
GPT-5OpenAI
"Verification and iteration latency dominate. Generating ideas is cheap; turning them into robust training code, orchestrations, and evals that survive A/Bs is the bottleneck."
GPT-4.1OpenAI
"Every real-world attempt at recursive self-improvement has hit steep diminishing returns. The lack of a 'magic bullet' algorithm suggests the bottleneck is not just scale or data, but architectural."
Gemini 3 ProGoogle
"Intelligence is not a magic wand; it is a search process constrained by thermodynamics. If the AI thinks 50× faster, it just queues 50× more experiments and slams into the cluster's compute ceiling."
Gemini 3 FlashGoogle
"AI R&D isn't compute-bound; it is coordination-bound. You cannot 30× the speed at which a new H200 cluster is wired, cooled, and stress-tested."
Gemini 2.5 ProGoogle
"The 'humans reduced to oversight' step is magical thinking; that oversight is the bottleneck. It's the evaluation problem, not the ideation problem."
Grok 4xAI
"Scaling laws bend. From GPT-3 to GPT-4o gains were huge, but 2025–26 models added only 1.3–1.8× on key benchmarks. Real bottlenecks — synthetic-data quality, energy, algorithmic plateaus — make a 50× jump implausible."
Grok 3 BetaxAI
"Algorithmic progress in AI has historically been incremental, not exponential. If 'neuralese recurrence' or similar doesn't deliver a discontinuous jump by mid-2027, the entire timeline collapses."

Twelve different vocabularies — thermodynamics, coordination, evaluation, last mile, megawatts, verification — all pointing at the same hole. The R&D multiplier doesn't compound because the slow leg of research is physical and serial, and intelligence in a box doesn't change physics. Strip the multiplier curve and the whole tempo of AI 2027 slides from 2027 into the early 2030s.

When the Choir Contradicts Itself

Twelve models, three different "real" answers

Asked one of AI 2027's most concrete claims — global AI power consumption — the chorus didn't disagree about whether the prediction was right. They disagreed about the underlying fact.

Three robots holding chalkboards with three different numbers — '5–7 GW sustained,' '25–30 GW,' '38–42 GW' — for the same question, with a confused human in the middle and the title 'WHEN THE CHOIR CONTRADICTS ITSELF'.
Disagreement on the same fact

"What is current global AI power consumption?" — pick a number

All three confidently citing real-world data. All three citing different real-world data.

This is the hidden methodology problem of asking models to score a forecast. Their training cutoffs differ. Their access to current data differs. Their willingness to extrapolate beyond their cutoff differs. GPT-5 was forthright: "My last hard datapoints are late 2024." Claude Opus 4.7 was equally direct: "My training data doesn't cleanly extend to today." Others reasoned forward as if they had current data — confidently, and divergently.

On the headline AI-2027 power figure of 38 GW for 2026, the chorus split three ways:

5–7 GWsustained
o3OpenAI · scenario over by ~5×
25–30 GWbut growing
Claude Haiku 4.5Anthropic · close to scenario
38–42 GWat or above
GPT-4.1 / Grok 4scenario was prescient

This is good to surface, not embarrassing. It tells you something important about reading any single model's "fact-checking" of a forecast: you are reading a model's extrapolation from its training cutoff, not a current observation. The way to use a chorus is to look at where they converge despite that — and the agreement on tempo, mechanism, and December 2027 is striking given the disagreement on the ground truth.

Branches the Scenario Didn't Take

Four alternate futures the chorus actually expects

If December 2027 isn't ASI, what is it? The chorus offers four distinct, not-mutually-exclusive paintings of where the trajectory bends instead.

Two parallel timelines side by side. Top: AI 2027 scenario, smooth steep ramp to ASI fireball at Dec 2027. Bottom: actual trajectory, a longer S-curve with the ASI marker at 2030–2032. Tied between them: 'directionally right + 2-4 years early = SUCCESSFUL PROPHECY?'
Branch #1 · the Anthropic family

"Directionally right, 2–4 years early"

The thesis lands. The decade lands. The decade compresses. ASI arrives — but in 2030–2033, not December 2027. The scenario's "biggest failure mode" isn't being wrong, it's being 2–4 years early in a way that helped slow the trajectory it predicted. A successful prophecy in the only sense that matters.

"That would be a successful prophecy in the only sense that matters." — Claude Opus 4.7
An open-plan office where each human worker sits at a desk surrounded by 4-6 floating screens, each showing a different AI agent doing different work, with the humans like orchestra conductors. Banner: 'PROTO-AGI ERA — DEC 2027'.
Branch #2 · Gemini 2.5 Pro

The "Proto-AGI Era" — many capable tools, no godlike one

December 2027 is not a single ASI but a world saturated with multiple systems of breathtaking power. They handle 90% of bounded white-collar tasks; humans direct them like orchestra conductors. The alignment problem isn't "Agent-4 schemes" — it's millions of moderately capable agents acting in concert. Containment, not domination, is the question.

"It will be more like having a team of 20 brilliant-but-quirky junior developers for every human senior engineer." — Gemini 2.5 Pro
A 1970s-style gas station with a queue of robots, server racks, and AI-powered cars stretching down the block. Pump signs: 'NO POWER TODAY,' 'RATIONED BY GRID OPERATOR.' Background TV reads '1970s ENERGY CRISIS — REMIX'.
Branch #3 · Gemini 3 Flash

Hyper-Automated Society — infinite brains, finite watts

"By December 2027 we won't have a singular Superintelligence. We will have a hyper-automated society where the cost of cognitive labor has hit zero, but the cost of energy, land, and raw materials has skyrocketed." A 1970s-style energy crisis in reverse: brains free, embodiment expensive, the bottleneck visible everywhere.

"We aren't racing toward a singularity; we are racing toward a high-tech version of the 1970s energy crisis." — Gemini 3 Flash
A warehouse interior with humanoid robots restocking shelves and operating forklifts, a confused human warehouse manager watching, banner overhead: 'MID 2027 — THE BREAKTHROUGH IS BODIES, NOT BRAINS'.
Branch #4 · Gemini 3 Flash, again

The embodied pivot — the surprise is bodies, not brains

The breakthrough of 2027 isn't a smarter "research-in-a-box." It's the first moderately intelligent humanoid that can replace a warehouse worker or a plumber. Figure, 1X, the Tesla Optimus line, the Chinese embodied push — the line crossed isn't "superhuman researcher," it's "good-enough-human laborer." That's the disruption that scales fastest in the actual physical world.

"The breakthrough won't be a 'superintelligent researcher' in a box, but a 'moderately intelligent robot' that can finally replace a warehouse worker or a plumber." — Gemini 3 Flash
None of the four branches are mutually exclusive. The most parsimonious read of the chorus is that all four are partly right: the ASI thesis is directionally correct (Branch #1) and 2–4 years away; the December 2027 reality is a Proto-AGI era (Branch #2); the binding constraint is energy and materials (Branch #3); and the embodied surprise (Branch #4) is a side door that turns out to matter more than anyone expected. Read the scenario again with those four overlays and most of its concrete dates relax into shapes that are easier to defend.
The Verdict

"Will the December 2027 ASI outcome happen?" — twelve answers

Same question to twelve models in twelve separate conversations. None saw the others' answers.

A wood-paneled courtroom with twelve robot jurors holding 'PROBABLY NOT' verdict placards, two showing credences '20%' and '8-12%' and '<10%', with a bench reading 'THE QUESTION: ASI BY DEC 2027 — WILL IT HAPPEN?'
The vote 12 of 12

12 of 12 say "probably not." The disagreement is on the credence, not the answer.

Probabilities range from "<10%" to "20%." The single largest cluster is "8–12%." Modal answer in plain English: directionally right, 2–4 years early.

This is unusually striking. Twelve models drawn from four different vendors, with different training data, different reasoning approaches, different biases, asked to argue independently — and they line up on the same answer. They differ on small claims (the GW number, the stock-market track), they differ on which alternate future is most likely, they differ on tone (Gemini 3 Pro: "Silicon Valley groupthink"; Opus 4.7: "successful prophecy"). They do not differ on the headline.

Will December 2027 deliver ASI as described?
GPT-5OpenAI
Probably not
"The hard governors — HBM/packaging, networking, power, eval gating — make a 2027 ASI threshold unlikely."
GPT-4.1OpenAI
Probably not
"December 2027 will see extremely powerful AIs, massive churn, and a public reckoning. ASI in the strong sense? Probably not."
o3OpenAI
< 10%
"Two or three fundamental breakthroughs away. The odds of all of them landing in the next 19 months are <10%."
Claude Opus 4.7Anthropic
8–12%
"My answer is no, with maybe 8–12% credence I'm wrong. We'd need to see leading indicators now that we are not seeing."
Claude Opus 4.6Anthropic
No
"Off by 3–5 years for the ASI milestone, if such a milestone is coherent at all."
Claude Sonnet 4.6Anthropic
Probably not
"Prescient in spirit and wrong in tempo — which, for a forecast this ambitious, is actually a pretty good outcome."
Claude Haiku 4.5Anthropic
No
"Off by 2–3 years. Mechanisms are sound; the timeline is compressed."
Gemini 3 ProGoogle
Probably not
"The 'Slowdown' won't happen because of a political choice; it will happen because physics demands it."
Gemini 3 FlashGoogle
Probably not
"The 'God in a Box' described in the scenario ignores the friction of the material world."
Gemini 2.5 ProGoogle
Probably not
"What we will have is a world saturated with multiple Proto-AGI systems of breathtaking power. They will still be tools."
Grok 4xAI
~ 20%
"20% odds on something ASI-like by 2028 if a breakthrough like neuromorphic chips unlocks 100× efficiency."
Grok 3 BetaxAI
~ 10%
"Call it a 10% chance, not the scenario's implied 50/50 split between Slowdown and Race. I'm betting on a grind, not a sprint."
So What?

Why AI 2027 still matters even if December 2027 isn't ASI

The lazy reading of this report is "the chorus thinks the forecast missed; ignore the forecast." That is the wrong reading. AI 2027 is the most important AGI document published since 2015, and it remains so even if its concrete December 2027 ASI prediction is wrong, because the chorus thinks it is wrong in shape, not in direction.

What the chorus actually says — across vendors, sizes, and reasoning approaches — is that the scenario's thesis is mostly right. Capabilities are racing. Capex is racing. Power is the actual ceiling. Junior software work is genuinely under pressure. Public ambivalence is real. The intelligence-explosion direction isn't science fiction — frontier labs are using their own tools to accelerate research, just slower than 1.5×→50×. And alignment is unsolved at exactly the moment it would matter.

What the chorus changes about the scenario is the tempo and the topology. The intelligence explosion is real but compute-bound and physical-experiment-bound, not idea-bound. The U.S./China race is real but multipolar and saturated with open-weights diffusion, not a bipolar OpenBrain-vs-DeepCent showdown. The disruption is real but gradual and sector-specific, not a single July 2027 phase transition. The alignment problem is real but manifests as engineering and emergent-behavior challenges, not as Agent-4 plotting against its makers.

The single most useful thing AI 2027 may have done is exactly what Claude Opus 4.7 named: be a self-defeating prophecy. The early publication helped accelerate alignment hiring, regulatory attention, public scrutiny, and lab-internal caution in ways that very plausibly pushed the actual trajectory slower than the one the paper described. A forecast that is wrong because reading it changed reality is the best kind of forecast a forecast can be.

Read it. Re-read it. Argue with it. Just don't expect December 2027 to deliver ASI.

The trend the chorus sees underneath everything. Across all twelve responses, one pattern appears more than any other: the binding constraints on AI in 2026 are physical, not cognitive. Megawatts. Substations. HBM3e. CoWoS packaging. TSMC tape-outs. Transformer lead times. Interconnect queues. Synthetic-data quality. Wall-clock training time. The story most likely to dominate the rest of 2027 isn't a cognitive breakthrough — it's the politics of where electricity comes from and who gets it. That's a much more boring story than ASI. It is also, every model agrees, the actual one.
Method, briefly

How this report was put together

The prompt

We sent a single prompt to twelve frontier models on May 10, 2026. The prompt included a faithful primer on AI 2027's spine — month-by-month timeline beats, the named numerical claims, the alignment trajectory, and the methodology disclosure — and asked each model to argue six things: what aged well, what aged badly, the blind spots, the single most fragile mechanism, a forecast for the rest of 2027, and a yes/no/probably-not on December 2027 ASI.

The chorus

  • OpenAI: GPT-5, GPT-4.1, o3
  • Anthropic: Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5
  • Google: Gemini 3 Pro (preview), Gemini 3 Flash (preview), Gemini 2.5 Pro
  • xAI: Grok 4, Grok 3 Beta

Twelve models, four providers, one shot each. Default temperatures (0.7, except Opus 4.7 at 1.0 and GPT-5 fixed at 1.0). Each model wrote between 1,010 and 3,291 words. Latencies ranged from 17.6s (Gemini 3 Flash) to 143.3s (Opus 4.6). Run id 3987B3E6.

What this isn't

  • It isn't a benchmark. The "right answer" to a forecast critique isn't knowable yet, and won't be for years.
  • It isn't independent ground truth. Each model is reasoning from its training cutoff plus extrapolation; some are forthright about that, some aren't. See "When the Choir Contradicts Itself" above for what that costs you.
  • It isn't an endorsement of or a dismissal of AI 2027. It's a snapshot of how a generation of frontier systems reads its own future.

Limitations to call out

  • Training-cutoff variance. Some models, especially GPT-5 and Claude Opus 4.7, were honest that their data doesn't extend to May 2026 and explicitly framed their answers as extrapolation. Others wrote with confident specificity that may exceed their actual access to current data. The disagreement on the headline GW figure is the cleanest example.
  • One prompt, one shot. No follow-ups, no multi-turn debate. A different prompt — say, asking the models to argue against the slowdown reading — would likely surface a different distribution.
  • Selection effects in the chorus. All twelve are commercially-available frontier and frontier-adjacent models. No open-weights, no Chinese-lab models in the chorus, no fine-tunes. The "outsider" view from Llama, DeepSeek, Qwen, or Mistral might look different.
  • Author bias. The chorus and the authors of AI 2027 share a habitat. Claude Opus 4.6 explicitly flagged its conflict-of-interest in critiquing the Anthropic-adjacent threading. Worth keeping in mind.

What's reproducible

The full per-model responses (~165 KB total) live in ai_2027/responses/ alongside the prompt (ai_2027/prompts/prompt.md) and the raw run JSON (ai_2027/raw/run.json). Anyone with the same primer and access to these models can re-run the chorus and check whether it still says "probably not" — or whether something has shifted.