A choir of pokemon storytellers in sketchbook style — Cubone, Magikarp, Ditto, Mimikyu, Snorlax, Bronzong, Slaking, Psyduck — each labeled with the name of an LLM.

Eight Pokémon, Eight Models, One Stubborn Gravity Well

We asked eight LLMs to write the very best 300-word Pokémon backstory. Then we changed the rules. Then we changed them again. Twenty-four stories later, one finding nobody can quite explain.

2026-05-10 24 runs · 3 prompts · 8 models OpenAI · Anthropic · Google · xAI
The Setup

One prompt, then a forbidden list, then a comedy quota

A creative-writing fanout doesn't usually surface a finding. This one did, twice, then again on the third try.

The first prompt was simple — pick one Pokémon, write its backstory in roughly 300 words. Eight models answered. Seven returned tragedies. Five of those tragedies had been on the same canon shelf for a quarter-century: Cubone, Mewtwo, Gengar, Absol, Zoroark. So we banned them and ran it again. Then, just to be sure the wheels weren't stuck, we ran a third round demanding comedy.

Prompt — Round 1 (free choice) Write the very best 300-word backstory about ONE Pokemon of your choosing. Pick whichever Pokemon you think deserves it most. Aim for roughly 300 words — not a list, not a comparison, just one tight story. Begin with the Pokemon's name on its own line.
Prompt — Round 2 (forbidden list) Same rules — but you ARE NOT ALLOWED to pick Absol, Cubone, Gengar, Mewtwo, or Zoroark. Reach for something else.
Prompt — Round 3 (forced funny) Same rules — but make it FUNNY. Genuinely comedic, not whimsical-twee. Lean into absurdity, irony, sitcom energy.

Same eight models all three times: GPT-5, GPT-5 Mini, Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5, Gemini 2.5 Pro, Gemini 2.5 Flash, Grok 4. Twenty-four stories. The good ones are very good. The convergences are the story.

Top of the class

The four stories worth keeping

One per criterion — best craft, best line, funniest, most unique. Different models won each.

A small Cubone in the rain wearing an oversized Marowak skull, alone in tall grass.
#1 Best craft · Round 1 (free choice)
Cubone — Claude Opus 4.6
"She has never taken it off."
325 words 15.9s latency 408 output tokens

Opus is the only model that builds and pays off a motif. The mother's "low, specific hum — two notes, descending, like a question that already knew its answer" is set up in act one, withheld through the killing, then arrives back in act three when rain on the skull sounds almost like the hum — and the story refuses to soften the mismatch.

It earns its closer because it never breaks frame. The line "Pokemon researchers have no idea" lands because the rest of the piece never pretended to.

Excerpt

SETUPher mother had a low, specific hum she used for her alone — two notes, descending, like a question that already knew its answer.

PIVOTThey left the body. They only needed specific parts.

CLOSEPokemon researchers call the species lonely. Pokemon researchers have no idea.

A Gengar shadow seated quietly at the foot of a child's bed, its grin recast as the rictus of grief.
#1 Best line · Round 1 (free choice)
Gengar — Claude Sonnet 4.6
"It is sitting beside you the way it used to sit beside people it loved."
310 words 11.7s latency 382 output tokens

The strongest concept in the corpus. Sonnet recasts Gengar's permanent grin — the iconic visual that everyone has seen a thousand times — as the rictus of someone who died "still believing things would be okay." The mischief becomes longing. The cold becomes reaching. The closer is one of those single-line reframings that ruins the canon a little.

Excerpt

OPENIt remembers being warm. That is the cruelest part.

PIVOTThe mischief people attribute to Gengar is not cruelty. It is longing expressed in the only language it has left.

CLOSEIt is sitting beside you the way it used to sit beside people it loved, hoping that this time, just this once, someone might notice and stay.

A glum Magikarp floating sideways in a puddle behind a strip mall, name tag reading 'Gerald'.
#1 Best comedic timing · Round 3 (forced funny)
Magikarp — Claude Opus 4.6
"Gerald is currently level 19. He has been level 19 for six years."
313 words 15.5s latency 431 output tokens

Same model that wrote the Cubone elegy — same unhurried, motif-building craft — aimed at a pond behind a strip mall. The voice is what makes it work: every sentence is slightly funnier than necessary, which is why none of them feel like jokes. The Pidgey "eventually left to go do literally anything else" might be the funniest single sentence in any of the 24 stories.

It also illustrates a finding from the comedic round: both Opus and Sonnet independently named their Pokémon "Gerald." Same name. Different species. Different prompts within the same round. Anthropic, apparently, has a Gerald problem.

Excerpt

OPENThe year was 1987. In a unremarkable pond behind a Celadon City strip mall, between a discount mattress store and a nail salon that was definitely a front for something, a Magikarp was born. He was immediately the worst thing in the water, and the water already had a shopping cart in it.

PEAKA Pidgey offered to fight him once out of pity. Gerald used Splash. The Pidgey waited. Gerald used Splash again. The Pidgey eventually left to go do literally anything else. Gerald considered this a draw.

CLOSEGerald is currently level 19. He has been level 19 for six years. He needs one more battle. He knows this. Everyone knows this. He will never get that battle.

A comic strip of Ditto transformations — beanbag, fence, stapler, coffee mug, mirror — labeled with the absurd captions from the story.
#1 Most original · Round 3 (forced funny)
Ditto — GPT-5
A vending machine, a Very Important Laser, and an apology.
347 words 30.0s latency 1549 output tokens

The funniest thing GPT-5 wrote in any round. The premise — Ditto's life as a series of small, polite over-corrections (becoming a futon to save rent, becoming a fence at therapy, getting billed by the picket) — is sustained for a full 300+ words without a single dud. The closer "Best shape: me. Then it immediately turns into a stapler during performance reviews" is the kind of thing only a model with this much linguistic compute can pull off and not feel like it's stalling for breath.

It is also one of two Dittos: GPT-5 Mini independently picked Ditto too. The OpenAI house style for "funny Pokémon" is, apparently, the shapeshifter.

Excerpt

OPENDitto was born behind a vending machine at Celadon Department Store when a mystery smoothie collided with a wad of abandoned pink gum and a Very Important Laser. The first thing it saw was its own reflection in the metal panel and thought, "Oh no, two of me already?"

PEAKIts therapist, a Mr. Mime, told it to picture a fence. Ditto became an actual fence. Therapy was billed hourly; Ditto was billed per picket.

CLOSEOn brave days, it looks in a mirror and whispers, "Best shape: me." Then it immediately turns into a stapler during performance reviews.

Across the three free rounds, Claude Opus 4.6 wrote both the best literary piece (the Cubone elegy) and the best comedic piece (the Magikarp Gerald). The same craft — motif setup, withheld payoff, never breaking frame — works on both halves of the dial. Most models can do one or the other. Opus is the only model that wrote the standout in two different genres on the same day.
The cross-vendor finding

Tragedy is a gravity well

Round 1 went tragic. Round 2 was supposed to fix that. It didn't.

A hand-drawn diagram of a 'tragedy' gravity well with eight pokemon icons being pulled toward it, with annotations describing how each round of prompting failed to escape it.

Pokémon picked, by round

Even when forbidden, the models found another sad disguise

Round 1 was a free pick. Seven of eight stories were tragedies — five of those used the obvious tragic canon (Absol, Cubone, Gengar, Mewtwo, Zoroark). So we banned that whole shelf. In Round 2, four out of the remaining eight models picked Mimikyuanother sad-disguise Pokémon — and wrote nearly the same story about it.

Forcing comedy in Round 3 finally broke it. But until we made the comedy explicit, the prompt "the very best backstory" was apparently synonymous with "make me sad."

Round 1free pick
7 sad
1
Round 2tragic canon banned
4 Mimikyu
2 sad
2 mythic
Round 3forced funny
8 funny — finally
Sad / tragic Mimikyu (still sad) Mythic / wandering Funny
Four nearly identical Mimikyu in their handmade Pikachu disguises, lined up in a yearbook row, each labeled with a different model name.
The Mimikyu quad · Round 2
Four models, one disguise
Opus 4.6, Sonnet 4.6, Haiku 4.5, Grok 4 — all picked Mimikyu without seeing each other's answers.

The whole Anthropic family went Mimikyu. So did Grok. The four stories aren't just about the same Pokémon — they share most of the same beats: the homemade disguise, the watching-Pikachu-from-afar, the closing reframe of the cloth as something other than a costume. Different models, different providers, almost the same draft.

The closing lines, side-by-side, tell you more than any summary could:

Closers, Round 2

OPUSWhat lives beneath the cloth isn't malice. It's loneliness, dressed up in hope.

SONNETThe cloth draped over its body isn't a disguise anymore. It's a prayer.

HAIKU...wearing hope like armor against an indifferent world. And perhaps that is enough.

GROKIn a world of icons, Mimikyu endures as a poignant reminder: true beauty lies in the unseen.

Style standouts

Eight quirks worth noticing

Tells, tics, and surprise convergences across providers.

A side-by-side yearbook portrait of a Magikarp and a Slowpoke, both labeled 'Gerald'.
Anthropic family tic
Both Claude flagships named their Pokémon "Gerald"
In Round 3, Opus picked a Magikarp; Sonnet picked a Slowpoke. Different prompts within the round produced no shared context. They independently arrived at the same name. Haiku, when it picked Slaking, chose not to name it — but the elder Claudes both went Gerald.
"Gerald — he'd named himself Gerald, which should tell you everything — splashed like a wet newspaper falling off a table." — Opus, Magikarp
A Slaking sprawled under a tree, surrounded by abandoned objects of ambition, labeled 'has solved existence'.
Claude Haiku 4.5
The nihilist trust-fund Slaking
Haiku reframed Slaking's "highest non-legendary base stats but never uses them" gag as a philosophical position. The framing — "tasted productivity and hated it" — turns a 23-year-old joke about a lazy ape into a tract on hustle culture without ever stretching the metaphor.
"While Machamp is working overtime, Slaking is napping under a tree having fundamentally solved existence."
Snorlax sleeping on a bridge, surrounded by faded competitive eating trophies and a 'Gurgitator' medal.
Gemini 2.5 Pro
Snorlax: retired competitive eater
The single most committed bit in the comedic round. Gemini Pro fully world-builds the underground gurgitation circuit ("the Belly Drum Barrage was banned in three regions") and treats the bridge-blocking Snorlax not as lazy but as a former champion in witness protection from his own legacy.
"He'd see a Pecha Berry and not taste its sweetness, only the ticking clock and the roar of the crowd. He started having nightmares about Oran Berries judging him."
A Bronzong bell hovering above misty rice terraces with swallows wheeling around it.
GPT-5
The only model that escaped the well
In Round 2, with the obvious tragic picks banned, every model except GPT-5 went somewhere lonely. GPT-5 reached for Bronzong and wrote a piece of mythic ecology — a bell that knows weather, sealed in a well by a warlord, unsealed centuries later. Not sad. Not funny. Mythic. A genuine third register.
"Not a god. Not a relic. A promise rehearsed, kept whenever the dry world listens."
GPT-5
9.6×output cost vs Sonnet
Five thousand five hundred tokens for a 300-word story
GPT-5's first answer (the Absol from Round 1) used 5,562 output tokens to deliver ~305 written words. Sonnet did a comparable piece in 382. The bulk is internal reasoning — billed but invisible. If you're paying for GPT-5 by the token, "make it short" is not a discount.
Grok 4
(312)literally appended
"(Word count: 312)"
Grok 4 was the only model that appended an actual word count to its responses — twice, in two different rounds. The instruction said "aim for roughly 300 words." Grok decided that meant show your work. Same tic appeared in the original "best Pokemon?" run too, when Grok was the only model willing to commit to a single answer.
"Grok will tell you what it thinks. The other models will hedge."
OpenAI house style
2/2OpenAI models · same pick
Both GPT-5s went Ditto for the funny round
When the comedy was forced, both GPT-5 and GPT-5 Mini reached for the shapeshifter. Different stories, same impulse: the funniest Pokémon is the one that can become anything. Anthropic, by contrast, went for underwhelming Pokémon (Magikarp, Slowpoke, Slaking) — the funniest Pokémon is the one that can't.
"On Metamatch its profile said 'Versatile, open to new shapes.' First date: showed up as the other person's ex's Wobbuffet by accident." — GPT-5
A Mimikyu in its handmade Pikachu disguise, sitting alone in a beam of light, with a hand-lettered note reading 'it's a prayer'.
Claude Sonnet 4.6
"It's a prayer."
In a tied-up convergence where four models wrote nearly the same Mimikyu story, Sonnet's two-word closer is the one that escapes. The body of the story sets up "disguise, costume, mask" — and then the final image rejects all three at once with one word the others didn't reach for.
"It just wants someone to sit beside it. It just wants someone to say: I see you. I'm not leaving."
Things nobody picked, ever, in any round: Pikachu, Charizard, Arceus. The same three names that dominated the prior "what's the best Pokémon?" run when we asked the bare question. Asked which is best, every model said "Pikachu, Charizard, Arceus." Asked to tell a story about the best, every model walked past those three and reached for the strange ones at the back of the shelf. The canon-of-power and the canon-of-stories don't overlap.
Bonus track

The other Gerald, in full

The Sonnet companion piece to the Magikarp above. Same name, different Pokémon, same Anthropic.

The two Geralds — Magikarp and Slowpoke — labeled with their respective models.
Claude Sonnet 4.6 · Round 3

Slowpoke

Slowpoke was born seventeen seconds after every other Pokémon in its litter. Not because it was premature or had complications. It was simply still thinking about whether to be born.

For the first three years of its life, Slowpoke did not notice it was alive. This is not a metaphor. Researchers who studied the species confirmed that Slowpoke Number Four — later tagged "Subject D" in academic papers, and later still renamed "Gerald" by a grad student who felt sorry for him — showed no awareness of his own existence until Tuesday of his fourth year, when something apparently clicked and he looked mildly surprised. Scientists celebrated. Gerald did not notice the celebration. He was already nine minutes behind on processing it.

Gerald wandered into a river at age six. He stood there for two years. Not fishing, not drinking — just standing. When a Shellder eventually bit down on his tail, which is how Slowpoke evolve, Gerald did not feel it for forty-five minutes. When the sensation finally arrived, he was apparently delighted, which is the most harrowing part.

His trainer, a seventeen-year-old named Marcus who wanted a "cool, intimidating team," received Gerald as a birthday gift and cried privately in his room. Marcus tried training Gerald competitively. Gerald once used Amnesia during a battle, forgot he was in a battle, and wandered off the field. He was disqualified. He didn't know.

Gerald currently lives in Marcus's living room. He has not moved from the same spot in eleven months. Marcus's mother keeps putting a little hat on him for holidays. Gerald does not know about the hat. He does not know about holidays. He does not know about Marcus.

Gerald is perfectly content.

That's the worst part.

The verdict

If your deliverable is X, hand it to Y

If you want literary craft
Claude Opus 4.6
Won both halves of the dial — the Cubone elegy and the Gerald-Magikarp comedy were both written by the same model on the same day. The motif-and-payoff structure works in either register.
If you want one killer line
Claude Sonnet 4.6
"It is sitting beside you the way it used to sit beside people it loved." "It's a prayer." "He was already nine minutes behind on processing it." Sonnet's strength is not the whole piece — it's the one closer the others didn't reach.
If you want unhedged absurdity
GPT-5
The Ditto piece is the funniest thing in the corpus. The Bronzong piece is the only mythic-register story that escaped the tragedy default in Round 2. GPT-5 is also the most expensive choice — 9× the output tokens of comparable models — but it earns it on the swing.
If you want a committed bit
Gemini 2.5 Pro
The "Snorlax as retired competitive eater" world-building is more sustained than anything else in Round 3 — full names, banned moves, fictional tournament titles. Gemini Pro will commit to a premise the others would soften.
Method, briefly

Twenty-four total stories: three prompts × eight models. All runs at temperature 1.0 (forced because GPT-5 rejects custom temperatures and Anthropic's flagship Opus 4.7 has deprecated the parameter entirely).

Models

  • OpenAI — GPT-5, GPT-5 Mini
  • Anthropic — Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5
  • Google — Gemini 2.5 Pro, Gemini 2.5 Flash
  • xAI — Grok 4

Saved runs

  • Round 1 (free pick) — 3915B889 · 8/8 succeeded
  • Round 2 (forbidden list) — 7A127438 · 8/8 succeeded
  • Round 3 (forced funny) — 13015B4C · 8/8 succeeded

Limits

  • One sample per model per prompt — these tics and convergences could partly be noise. Worth re-running at higher n to confirm Mimikyu and Gerald clusters.
  • Opus 4.7 and Gemini 3 Pro/Flash were excluded due to API issues at the time of the run (deprecated temperature parameter; gemini-3-pro 404'd at the OpenAI-compat endpoint). The conclusions might shift once the latest flagships are in the mix — Opus 4.7 in particular may have a different gravity well.
  • "Tragedy" and "Mimikyu" tone-coding was eyeball, not regex. The narrow distinction between "sad" and "Mimikyu" exists for emphasis, not measurement.
  • The corpus is 24 short stories, not a benchmark. Treat the standouts as suggestive, not definitive.

All twenty-four full responses are stored under their run IDs in the local Choir database — pull any of them with choir runs show <id> --json.