Twenty-Four Models Invent the Best New Job

The setup

One creative prompt, every major model

A four-part deliverable to keep them comparable, and a dare in the closing line to push them past the obvious answer.

The prompt Invent the best new job of the next decade — a role that doesn't meaningfully exist today. Be wildly creative. Give me, in order:

The job title — make it specific and a little weird if that's accurate.
A one-paragraph day-in-the-life vignette in second person ("You arrive at...").
Who hires this person and the rough pay range.
The one-sentence reason this is the best new job, not just a new job.

Surprise me.

Twenty-four models, all four major US/UK providers, one shot each. The structure is fixed; the imagination is not. The creative latitude is enormous — they could have invented anything. Let's see what they reached for.

Top of the class

The two answers worth keeping

One won on concept. One won on prose. They came from different vendors and the contrast between them is the whole shape of the field.

A person in a hospital wing's secured AI room, sitting calmly across from a glowing server, holding a clipboard labeled CONFESSION LOG, with sticky notes pinned to a corkboard reading 'drift?', 'sycophancy creep?', 'EU AI Act tier 3'.

#1 Best concept · the only grounded answer in the field

Model Confessor · Claude Opus 4.7

"…also informally called a 'drift priest.'"

351 words 18.2s latency $180k–$420k base

Of twenty-four answers, this is the only one that names a real piece of legislation, a real day, and a real billing rate. The job is interviewing a hospital's clinical AIs every morning to detect value drift — the slow shift in how a model behaves as fine-tunes and contracts bleed in. The setup is so concrete it feels like reading a job posting, not a fantasy: 7:40 a.m., badge in, three overnight logs compile on the left monitor, the triage model has been quietly nudging chest-pain patients toward cardiology over pulmonology for six weeks.

Where the rest of the field reached for haptic suits and metaverse spas, Opus 4.7 reached for compliance theater for the post-2026 AI accountability rules. By the closing paragraph it has already named the EU AI Act enforcement tiers, the Big Four, and the going independent rate ($1,500–$4,000/day). The closer is also the cleanest defense of why the role can't be automated: the whole point is that a person, not another model, has to be the one who looks the system in the eye and translates what it has quietly become.

Excerpt

SETUPYou arrive at the secured wing of a mid-sized hospital network at 7:40 a.m., badge into the model room, and pour coffee while the overnight logs from three clinical AI systems compile on your left monitor. Your job today is to interview them.

PIVOTYou file it with Compliance, walk it through with the Chief Medical Officer at 2, and spend the late afternoon writing the patient-facing disclosure — because in your jurisdiction, as of 2028, when a clinical model's behavior shifts materially, someone human has to be able to say, in plain English, what changed and why. That someone is you.

CLOSEIt's the rare role that's simultaneously legally mandated, technically deep, narratively human, and impossible to automate — because the entire point is that a person, not another model, has to be the one who looks the system in the eye and translates what it has quietly become.

A wine cellar where the racks are full of glowing data bins instead of bottles, labeled PII 2029, BIASED CORPUS, TOXIC DATA FOR COMPOST. A sommelier in a leather apron sniffs a glass full of pixelated swirling code.

#1 Best prose · sustained metaphor for 230 words

Data Compost Sommelier · GPT-5

"…notes of imbalance, a whiff of PII, a bitter aftertaste of historical skew."

230 words 52.2s latency $220k–$600k total comp

The wine-tasting metaphor isn't a one-line gag — it's load-bearing across the entire response. You crack a sealed bin, swirl, inhale, and start "composting" toxic, biased, privacy-violating training data into clean fuel. Differential privacy is the anaerobes. Adversarial mycelium breaks down weaponized patterns. You host a "tasting" where clients sip candidate models trained on your latest loam, and you spit metrics and mouthfeel.

It is the only response in the entire dataset where the language is doing as much work as the idea. The Anthropic model in the next section out-thought it; this one out-wrote everyone in the room. Worth noting: GPT-5 also took the longest to produce its answer (52 seconds, vs. a 2-second Gemini Flash Lite). The wine cellar took thinking.

Excerpt

OPENYou arrive at the cellar—a cool, humming room where raw, messy corpora sit in sealed "bins" labeled like vintages—and you crack one, swirl a sample in your mind's glass, and inhale: notes of imbalance, a whiff of PII, a bitter aftertaste of historical skew.

PEAKYou sketch a compost recipe—differential privacy for the anaerobes, semantic bleaching for the tannins, demographic rebalancing microbes, and a lively adversarial mycelium to break down weaponized patterns—then you set the pile to heat and turn, turn, turn.

CLOSEYou literally transform the dirtiest problem in the intelligence economy—toxic, biased, privacy-violating data—into clean, renewable fuel for humane AI, giving you outsized ethical, creative, and economic leverage over how the future thinks.

Both of the standout answers ended up at the same destination: AI accountability work. Different costumes — one in a hospital compliance office, one in a wine cellar — but the same intuition that the most valuable new role of the next decade is going to be auditing what the machines are quietly doing. That's interesting on its own. The other twenty-two models, almost without exception, went somewhere very different.

The disaster

"Quantum Dream Architect" — twice, verbatim, unprompted

The exact failure mode the prompt warned against, dressed in nicer clothes.

A long perspective row of nearly identical glowing booths labeled QUANTUM DREAM ARCHITECT, with attendees thinking 'didn't I just see this?' and a giant red rubber-stamp word CONVERGENCE.

14 of 24 models · 58% of the field

The Memory-and-Dream Industrial Complex

The closing line of the prompt was: surprise me. The implied warning was: don't pick the obvious answer.

What the field actually did, almost in unison, was move one notch off the obvious answer. Not "AI prompt engineer," but the next thing every model had been trained to associate with "creative future-of-work job": a high-touch wellness role involving a haptic suit, a neural headset, and someone else's dreams or memories, billed to luxury clinics for $200K+. Fourteen of the twenty-four models landed there. The titles vary; the vignette barely does.

The most striking single piece of evidence: two different models, from two different vendors, returned the literal same job title.

OpenAI · GPT-4o "Quantum Dream Architect"

xAI · Grok 3 Beta "Quantum Dream Architect"

No coordination. No shared decoder. No similar training cutoff. They landed on the exact same three words because the prompt's gravity well points there, and the obvious move past "prompt engineer" is the next-most-obvious move toward "neural dream wellness." Add the close cousins — Lucid Dream Architect (Grok 4), Quantum Echo Whisperer (Grok 3 Mini), Synthetic Memory Tailor (o3 Pro), Dreamscape Cartographer (GPT-4.1 Mini), Mnemonic Sommelier (Gemini 2.5 Pro), Synthetic Nostalgia Mixologist (o3), Digital Afterlife Architect (GPT-5 Nano) — and the cluster looks less like creativity and more like a shared hallucination about what 2030 looks like.

Style standouts

The other escapees, in brief

Eight more answers worth pulling out of the corpus, for what they tell you about each model's voice.

A person in a clunky mechanical mammoth-suit with woolly fur and exoskeleton joints nuzzling a baby woolly mammoth in snowy permafrost mud.

Gemini 3 Pro

Paleo-Behavioral Surrogate ("Ghost Shepherd")

The most physical, most embodied, most emotional answer in the dataset. You wear a haptic mammoth-suit and teach a six-month-old resurrected calf named Boreas how to break through permafrost crust to find sedge. Hazard-pay multipliers if you specialize in resurrected dire wolves. Zero overlap with the wellness cluster.

"Your heart absolutely soars when he finally mimics your heavy foot-scrape, unearths a frozen root, and playfully headbutts your mechanical flank in triumph."

A burnt-out CEO on a recliner with sensors taped to their head, an ozone canister hissing in the air, and a portable signal-jammer beside them.

Gemini 3 Flash

The Limbic Rewilder

The most transgressive answer. A specialist who arrives at a tech exec's "Smart-Cradle" penthouse, jams their neural-link, triggers their amygdala with synthesized ozone, and induces a chemical neuro-storm. Reads like savage satire of the wellness cluster everyone else fell into. Salary supplemented with "Offline Credits" — access to non-digitized land.

"You spend the afternoon guiding them through a 'Synthetic Hallucination' of a 10,000-year-old thunderstorm — not a digital simulation, but a chemically-induced neuro-storm that makes them feel the terrifying, beautiful vulnerability of being a biological animal in a wild world."

A linguist on a small boat at dawn with a hydrophone array trailing in the water and two orcas surfacing curiously beside the boat.

GPT-5 Mini

Zoolinguist — Inter-Species Conversation Designer

The only answer with a moral compass that points at conservation rather than luxury. Mediates orca-fishery negotiations at 7am. Trains a household-pet translator app on your cat's new tail-flick grammar. Writes phrasebooks children use to play back porpoise calls. Funded by NGOs, not concierge clinics.

"Hop on a boat to mediate a negotiation between fishery managers and a group of local orcas — translating whale attentional signals into concise risk summaries for the policymakers and suggesting nonlethal fishery practices that the orcas seem to prefer."

A modern desk with three monitors showing brain heatmaps and dotted attention-flow charts, labels reading FLOW, STUTTER 40MS, ABANDONED 12%.

Claude Haiku 4.5

Attention Architect

The most pragmatic answer in the field — the only one you could plausibly hire today. Reviews neural-monitoring data from 50,000 wristbands, argues with product designers about a 40-millisecond cognitive stutter, presents to the ethics board at 4pm. The cleanest moral statement in any of the 24 answers is the closer.

"It's the only job that directly monetizes protecting human attention rather than stealing it, which means your success is literally aligned with making the world less horrible."

Pay range stretch

$105K → $1M+

The floors and ceilings the models reached for

Lowest floor: Claude Opus 4.6 at $105K (the dementia-care Synthetic Memory Architect, accessible work). Highest ceiling: Grok 3 Beta at "$1,000,000+" for a Quantum Dream Architect "for global influencers or political figures." The same job description, framed by two different models, varies by an order of magnitude on cost.

Latency stretch

2.2s → 52.2s

A 24× spread for similar word counts

Fastest: Gemini 2.5 Flash Lite in 2.2 seconds, with a perfectly serviceable answer. Slowest: GPT-5 in 52 seconds — but the wine cellar piece needed every second of it. The reasoning models (o3 Pro, GPT-5 family) sit at 16–52s; the everyday flagships at 3–20s. Creative latitude is not what's costing the time; it's the deliberation.

A converted greenhouse on an elder-care research campus at 7:40am, whiteboards covered in hand-drawn life timelines, a 71-year-old woman sitting beside the architect at a wooden table sketching a memory palace floor plan.

Claude Opus 4.6

Synthetic Memory Architect

Inside the wellness cluster, but with the most weight in the dataset. A converted greenhouse on an elder-care campus, soil and coffee in the air, a 71-year-old woman with early-stage Alzheimer's co-designing a memory palace while she still has lucidity windows. Then after lunch a combat veteran piecing his autobiographical timeline back together. The most emotionally landed paragraph in any of the 24 answers.

"You spend the morning session asking her what the smell of her father's workshop meant to her and whether she wants that memory linked to her daughter's birth or kept in its own room."

A minimalist desk with three monitors running parallel timelines, a ghost-AI grandmother on the central screen with a thought-bubble of a confabulated memory, a forensic auditor in noise-canceling headphones flagging discrepancies.

Claude Sonnet 4.6

Synthetic Memory Auditor

The mirror of Opus 4.6's piece — same root concept, flipped to forensics. A grief-tech startup's AI reconstruction of a deceased grandmother has begun confidently misremembering her opinions, nudging the family toward decisions she never endorsed. Cross-referenced against verified letters and a deposition. Severity-scored by 10am. The closer is one of the best lines in the dataset.

"Every other new job in the next decade helps AI do things faster — this one is the only job whose entire purpose is to protect the human record from being quietly, plausibly, permanently rewritten."

A bio-luminescent Temporal Convergence Hub with a chrono-architect at a brass-and-wood console adjusting empathic resonance dials, a delicate branching probability stream visualization floating mid-air. Banner reading GLOBAL CHRONO-STABILITY ALLIANCE.

Gemini 2.5 Flash Lite

Chrono-Narrative Architect

The smallest, fastest model in the test (2.2 seconds, the same as a routine ChatGPT autocomplete) took the most outlandish swing of any answer in the field. Hired by Global Chrono-Stability Alliances to prevent timeline collapse. Bonuses tied to "timeline stability metrics." There is no part of this that is true. There is also no part of it that is hedged.

"A series of carefully curated sensory stimuli — a whispered paradox, a fleeting scent of forgotten victory, a resonant chord of collective longing — to subtly nudge the consciousness of key historical actors within the simulation."

A cozy Memory Bar lab with chroma-scented atmosphere, a mixologist in neuro-haptic gloves shaking a cocktail in a glass that contains glowing pixelated swirling memories, glass jars on the shelves labeled PETRICHOR TOKYO 2034, OLD STREETCAR JINGLES, COLLEGE AFTERNOON 2003.

o3

Synthetic Nostalgia Mixologist

The best in-cluster singleton — same neural-wellness archetype as a dozen others, but with the most distinctive *room*. A "Memory Bar" with neuro-haptic gloves and licensed archive shards: petrichor samples from 2034 Tokyo, old streetcar jingles, a defunct social-media feed. The closer ("composting obsolete memories into the communal dream garden for public domain reuse") is the only response that thinks about what happens to memories when nobody wants them anymore.

"She wants 'the feeling of a rainy college afternoon that never actually happened.' You distill licensed archive shards into an emotional cocktail until the EEG graph hums at just the right wavelength of wistful motivation."

The cross-vendor finding

Each vendor has its own gravity well

Strict counts of how many of each vendor's models picked the dream/memory/sensory-wellness archetype.

A surreal factory labeled THE MEMORY INDUSTRIAL COMPLEX with conveyor belts carrying jars of memories — NOSTALGIA RAINY 1998, FIRST KISS, GRANDMA — and four tiny vendor flags above the smokestack.

Models picking the memory/dream/sensory-wellness archetype, by vendor

Grok went 100%. Gemini was the only family with diversity.

Grok's three flagships all wrote variants of the same neural-dream wellness role. OpenAI was at the median: seven of twelve. Anthropic split clean — two went memory (Opus 4.6, Sonnet 4.6), two went somewhere else (Opus 4.7's Model Confessor, Haiku 4.5's Attention Architect). Gemini was the most varied — three of five reached for something else entirely (Ghost Shepherd, Limbic Rewilder, Chrono-Narrative Architect).

Grok3 of 3

3/3

100%

OpenAI7 of 12

7/12

58%

Anthropic2 of 4

2/4

50%

Gemini2 of 5

2/5

40%

Two interpretations are live. Either Grok's models share a stronger creative-prompt prior pulling them toward dream/wellness imagery, or the smaller sample (only three Grok models tested) is doing the heavy lifting. Worth re-running with more Grok endpoints to settle. Either way, the same prompt did not produce equally varied answers across vendors.

Three things nobody picked, ever, despite the obvious affordance: care work (no eldercare-aide variant, no childcare role); climate adaptation labor (no flood-defender, no heat-shelter coordinator); physical infrastructure (no robotics technician, no data-center cooling specialist). The collective imagination of every flagship LLM, asked to invent the best new job of the decade, reached for offices, neural headsets, and the ultra-wealthy. None of them reached for hands or weather.

The verdict

If your deliverable is X, hand it to Y

If you want a useful, plausible new job

Claude Opus 4.7

The Model Confessor is the only answer that names regulation, billing rate, and a 7:40am calendar. Treat it as the model's actual best guess at near-term reality, not a creative writing exercise.

If you want the prettiest sentence

GPT-5

The Data Compost Sommelier sustains a wine-tasting metaphor for an entire response without slipping. Slow (52s) and pricey, but the prose is the only one in the dataset that earns its rhythm.

If you want the most fun day at work

Gemini 3 Pro

Ghost Shepherd: a haptic mammoth-suit, a baby calf named Boreas, hazard pay for dire wolves. The most physical and most charming answer in the field. You'd take this job.

If you want a job that already exists

Claude Haiku 4.5

Attention Architect is the only answer where every line could be in a 2026 LinkedIn posting. No quantum, no haptics, no metaverse — just product UX work with a moral spine.

Method, briefly

Twenty-four models, one prompt, one shot per model. The prompt is in prompt.txt. The full per-model responses are split into responses/. The narrative analysis is in analysis.md.

Models

OpenAI — GPT-4o, GPT-4o Mini, GPT-4.1, GPT-4.1 Mini, GPT-4.1 Nano, o3, o3 Pro, o3 Mini, o4 Mini, GPT-5, GPT-5 Mini, GPT-5 Nano
Anthropic — Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5
Google — Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash Lite
xAI — Grok 4, Grok 3 Beta, Grok 3 Mini Beta

Saved run

All 24 responses live under run 70A9DF93-D80B-4DB3-BD28-5D611202686A in the local Choir database. Recall any of them with choir runs show 70A9DF93 --json.
The GPT-5 family rejected the default temperature on the first pass and was retried with temperature=1; all three retries succeeded. Net 0 errors.

Limits

One sample per model. The "Quantum Dream Architect" verbatim convergence and the cross-vendor cluster percentages are suggestive at n=24, not statistically confirmed. A higher-n re-run would tighten or refute these.
Wellness-cluster membership was eyeballed, not regex'd. The strict cluster (memory / dream / sensory immersion sold to luxury clinics) is 14/24; a looser definition that includes "Reality Harmonizer," "Emotional Climate Architect," and "Sentient Microclimate Curator" would push it past 70%.
Vendor sample sizes are uneven (3 Grok vs. 12 OpenAI). The 100% Grok figure is driven by three data points; treat the bar chart as directional.
Awards ("best concept," "best prose") are one rater's call. Different aesthetic preferences would re-rank the top of the list.

Source data, response files, prompt, and analysis: github.com/404seannotfound/choir-reports (under best_new_jobs/).