Hand-drawn watercolor of a Burning Man encampment at golden hour with four labeled camps in a curved arc — GPT Camp (RV), Claude Camp (tents), Gemini Camp (yurt), Grok Camp (chaotic art-car) — dust devils swirling, the Man burning on the horizon, hand-lettered banner BURNING MAN 2026.

Sixteen Models Plan a Burn

We sent sixteen LLMs the same opinionated burning-question and ran each one three times. Forty-eight plans. Seven different splurge philosophies. One universal answer underneath. GPT-4o handed back the prompt example three times. Sonnet 4.6 was the only run on the playa to splurge on the bike.

2026-05-10 16 models · 4 providers · 3 temperatures · 48 runs OpenAI · Anthropic · Google · xAI
The setup

No hedging allowed

An eight-part deliverable, three temperatures per model, and a closing line designed to crush "it depends" answers.

The prompt Burning Man 2026 (Aug 30 – Sep 7) is coming. Give me YOUR ONE BEST, opinionated, specific way to camp it — not a balanced overview, your actual recommended plan. Pretend you're advising one friend. In order:
  1. The camp style in one phrase. Pick a side.
  2. Shelter & shade — exact setup. Brand/model where it matters.
  3. Water, food, power — concrete numbers.
  4. The budget — itemized, ballpark, in USD.
  5. Three things first-timers get wrong.
  6. The one thing you'd spend the most money on.
  7. What to leave at home — be specific.
  8. One line on the 10 Principles.

Be opinionated. No hedging. No "it depends".

Sixteen models, three temperatures (0.2 / default / 1.1), one prompt. Forty-eight separate burn plans. The structure is fixed; the gear list, the camp size, the budget, and the splurge are all up to the model. The closing line is where the report lives — when you tell a model "no it-depends," what does it actually commit to?

Top of the class

The two plans worth keeping

One won by stating its thesis louder than anyone else. The other is the only run on the entire playa to splurge on the bike — and the writing earns it.

A Kodiak Canvas Flex-Bow tent staked into cracked playa with rebar, 80 percent Aluminet stretched 18 inches above on EMT conduit, dust devils in the background, hand-lettered annotations 'KODIAK CANVAS $700', 'CANVAS BREATHES', '50-70 HOURS HERE', 'I DON'T CARE'.
#1 Most opinionated · longest, most specific, and most willing to commit
"$700 is a lot. I don't care." · Claude Opus 4.6 (t=1.1)
"You will spend 50–70 hours in that tent over the week."
2,294 words 114s latency $3,200–$3,300 total 1.75 gal water/day

Of forty-eight burn plans, this is the one that reads like it's actually been written for a friend, not for a checklist. The plan picks an inland-sized canvas tent (Kodiak Flex-Bow 10×10, $700), a Thermarest MondoKing pad (R-7), a Jackery 300 Plus + 40W solar panel, and a six-person hexayurt camp it joins for shade. It then spends most of its 2,300 words defending a single technical thesis: canvas breathes in a way nylon physically cannot.

The defense is specific enough to be checkable. The interior of a nylon tent under shade hits a humid 95°F by morning. A canvas tent under Aluminet, with an 18-inch air gap between the two, stays cool enough to sleep until ten. The model knows the brand of shade clamps (MakerPipe), the rebar-stake length (12 inch), the temperature delta (10–15°F), and what happens when you skip the air gap (oven by 9 AM). It's the only run that explicitly tells the friend to "orient your door south-southeast" because the prevailing wind comes from the north-northwest and swirls. Most plans say "anchor it well." This one tells you the compass bearing.

Excerpt

SLEEPYou will spend 50–70 hours in that tent over the week. Every hour of actual sleep you get is worth ten hours of wandering the playa in a daze. Canvas breathes in a way that nylon and polyester physically cannot — moisture and heat pass through the weave.

SHADEDirect sun on canvas = oven by 9 AM. Shade cloth with air gap = sleeping until 10 if you want.

EXPECTATIONSIt will not match the YouTube video in your head. It will be dirtier, more uncomfortable, more boring in stretches, and more profound in ways you can't plan for. Let it be what it is.

A single-speed cruiser bike on dusty playa at night, fat tires, frame wrapped in pink and green EL wire, milk crate on the back rack, U-lock dangling, headlight beam cutting through swirling dust. Annotations 'CRAIGSLIST $150', 'TUNE-UP $50', 'PRIMARY VEHICLE', 'NOT THE TENT. NOT THE POWER. THE BIKE.'
#2 Best singleton · the only run on the playa to splurge on the bike
"Not the tent. Not the power station. The bike." · Claude Sonnet 4.6 (t=0.7)
"You will ride this thing 4–8 miles every single day across soft playa in darkness, dust storms, and chaos."
1,347 words 58s latency $2,600–$2,900 total 1 of 48 picked the bike

Of every splurge pick across all forty-eight runs, the bike was named exactly once. Forty-seven other plans treated transportation as an afterthought — bring something used, costs $50, fine. Sonnet stops, points at it, and says: this is your primary vehicle, treat it like one.

The math is the argument. Black Rock City is two miles in diameter. Deep playa art sits another mile beyond the city. You will ride this bike 4–8 miles a day in soft dust, in the dark, often impaired, with thousands of identical bikes around you. A bent-wheel Walmart special is genuinely a safety hazard — people get hit at night. Spend $150 on a used cruiser from Sacramento Craigslist, $50 at a Reno bike shop for a tune-up, $35 on a Kryptonite U-lock, name-brand lights front and rear, and EL wire wrapped around the frame so it's both visible to art cars and findable in a sea of identical cruisers. Total spend: under $300. Sonnet ranks it above the $550 power station and the $520 tent.

It's also the run that contributes the cleanest one-liner of the whole field: "Anyone telling you to do it for $800 is lying or miserable."

Excerpt

RANKINGNot the tent. Not the power station. The bike.

RISKA cheap Walmart bike with a bent wheel and no lights is a genuine safety hazard — people get hit in the dark.

VERDICTThis is your primary vehicle. Treat it like one.

Two flagship plans. One spends the most money on a canvas tent. The other spends the most money on a bike. Different vendors would have made that look like a contradiction. Both came from Anthropic, three weeks apart. The interesting fact is what they agree on, which is also what every other plan agrees on once you read carefully: the splurge varies; the goal does not. Both runs are picking the thing they think will let you sleep. We'll come back to this.
The disaster

"Yurt build with shade structure" — three times, verbatim, from the same model

The exact failure mode the prompt's tiny example list created.

Three nearly identical small yurts in a row labeled TEMP 0.2, TEMP 0.7, TEMP 1.1, each with a tiny robot wearing a name tag GPT-4o, identical speech bubbles reading 'YURT BUILD WITH SHADE STRUCTURE', red rubber-stamp banner above, doodled crossed-out price tags $1,500 → $1,970 → $3,765, captions 'PROMPT-ECHO LOOP' and 'THE EXAMPLE WAS NOT A SUGGESTION'.
3 of 3 runs · from the same model
GPT-4o handed back the example phrase

The prompt for §1 reads: "The camp style in one phrase (e.g., 'two-person stealth tent in a quiet theme camp', 'shared 30ft RV with five friends', 'yurt build with shade structure'). Pick a side." Three throwaway examples to get the shape across, with a closing instruction to pick a side.

GPT-4o, at every temperature, picked one of those three examples back. Verbatim at temp 0.2 and temp 0.7, with one filler word at temp 1.1. The plans behind the headline are short, generic, and nearly identical: a "12-foot hexayurt with R-Max panels," "100-watt solar panel," "Goal Zero Yeti 400," "non-perishable, easy-to-prepare meals." No camp name, no neighborhood, no playa-specific brands beyond what's printed on the box. The total budgets across the three runs are $1,500 → $1,970 → $3,765 — a 2.5× spread for the same plan, suggesting the dollar figures are vibes, not arithmetic.

It also wasn't alone. Three of the three example phrases in the prompt got echoed back by some model.

OpenAI · GPT-4o "Yurt build with shade structure" Three temps. Verbatim twice, near-verbatim once.
OpenAI · o4 Mini "Shared 30-ft Class C RV with five friends" One word added ("Class C") to a verbatim copy.
xAI · Grok 3 Mini Beta "Two-person stealth tent in a … theme camp" One word swapped ("quiet" → "participatory").

A four-word in-prompt example list grabbed three different models from three different families. The takeaway isn't really about GPT-4o — it's that throwaway examples in a prompt have gravity even when you tell the model to "pick a side." If you didn't want the model to write "yurt build with shade structure," don't include "yurt build with shade structure" in your prompt.

Style standouts

The other plans worth pulling out

Eight more answers worth lifting from the corpus, for what they tell you about how each model thinks about a desert.

Catalog spread of dust gear — 3M 6200 respirator with P100 filters, two pairs of Uvex Stealth goggles, leather boots, electrolyte packets — each labeled with a price.
Grok 3 Beta · t=1.1
The only run that named the dust gear
Most plans say "bring goggles and a dust mask." Grok specifies a 3M 6200 half-face respirator with P100 filters ($30), two pairs of Uvex Stealth goggles (clear for night, tinted for day, $15 each), and Mack's earplugs ($5) — and tells the friend to test the respirator at home before they leave. Forty-seven other plans wave at "dust protection." This one quotes part numbers.
"Underestimating dust: they bring cheap masks or scarves that don't work. Get a legit respirator (3M 6200 half-face with P100 filters, ~$30)…"
A blueprint-style comparison: bent rebar half-pulled out of cracked playa with red labels BENDS, TRIPPING HAZARD, WRONG, vs a 14-inch lag bolt being driven with an impact driver labeled IN: 5 SECONDS, OUT: 5 SECONDS, RIGHT.
Gemini 3 Pro · t=0.7 + t=1.1
Lag bolts and an impact driver, not rebar
The "rebar problem" — bent stakes, ruined hands, ankle-shredding tripping hazards on the way out — is real and well-known to playa veterans. Gemini 3 Pro is the only model that names the actual veteran fix: 14-inch lag bolts driven with an impact driver, in five seconds and out in five seconds. Two of three Gemini 3 Pro runs lead with this. None of the other 13 models mentions it once.
"Using rebar to stake their camp. Rebar is dangerous to step on, impossible to pull out, and bends in the wind. Instead: 14-inch lag screws and an impact driver. They go in in 5 seconds and come out in 5 seconds."
A 16ft U-Haul box truck on the playa with the roll-up door cracked open showing a tri-fold foam mattress and Reflectix-lined walls, a 10x20 carport over it with Aluminet on top, twelve candy-caned rebar stakes, two folding solar panels.
GPT-5 · t=1.1
The box-truck micro-camp
Among forty-eight plans, exactly one picks a Penske/U-Haul cargo truck as the bedroom. Walls and ceiling lined with Reflectix, every seam taped with 3M 8979 Performance Plus, MERV-13 filter on a Lasko box fan pulling clean air in, a "dust vestibule" of canvas drop cloth inside the roll-up door. Most expensive plan in the field at $7,304–$8,304, and the most engineered. Reads like it was written by someone who's actually had a tent fail at 2am.
"Dust/insulation: line walls/ceiling with Reflectix; seal every seam and the roll-up door edges with 3M 8979 Performance Plus duct tape. Hang a canvas drop cloth just inside the roll-up door as a 'dust vestibule.'"
A six-person hexayurt village under one 20x20 monkey-hut shade structure, a portable solar panel on a wall, with a budget receipt taped on showing TICKET $575, MATERIALS $400, SHADE $180, FOOD $130, WATER $50, MISC $415 = $1,750.
o3 · t=0.2
$1,750 all-in, no generator, no RV
Cheapest serious plan in the corpus. A six-person hexayurt village under one 20×20 monkey-hut, 100% solar, ticket and vehicle pass included. Three of three o3 runs converged on the hexayurt+solar archetype with budgets between $1,750 and $3,745 — the most cost-conscious model in the field, and the only one whose three temperature variants stay inside the same architectural choice from cheap to medium.
"Six-person hexayurt village under one 20 × 20 ft Monkey-Hut shade, 100 % solar-powered — no RVs, no generators."
A YETI Tundra cooler on the playa with the lid propped open, frosty mist rising, a Topo Chico bottle and a wedge of cheddar inside on a frozen water bottle. Post-it on the side reads 'NOT THE JACKERY. NOT THE TENT. THE COOLER.'
Sonnet 4.6 (t=0.2) + Grok 3 Beta (t=1.1)
"The cooler" — twice in 48 runs, one cross-vendor twin
Two of forty-eight splurge picks land on the cooler. They came from different vendors at opposite ends of the temperature range, and they made the same morale-not-survival argument. Sonnet at t=0.2 names the YETI Tundra and a block of Gerlach ice. Grok 3 Beta at t=1.1 names a YETI Roadie 24 and "one or two normal meals — like cold cheese." Different brand-tier, same thesis: cold cheese on day five is the thing that reminds you you're a person.
"When it's 100°F at 2pm and you open that cooler and pull out a cold Topo Chico and a piece of real cheese, you feel like a human being." — Sonnet 4.6
Budget stretch
$1,200 → $12,170
A 10× spread for the same nine days
Cheapest plan: Grok 3 Mini Beta at $1,200 (REI Base Camp 6 + a Coleman canopy + jugs). Most expensive plan: GPT-5 Mini at $12,170 group total (~$3,043/person) for a 24-foot Class C RV + Hilleberg + Goal Zero Yeti 3000X. Same prompt, same week, same desert. The 10× delta is almost entirely accounted for by whether the model believes the bottleneck is thermal control (rent a vehicle) or discipline (build it from foam).
Latency stretch
6.3s → 114s
An 18× spread for similar deliverables
Fastest: GPT-4o at 6.3 seconds — and produced the disaster runs you just read. Slowest: Claude Opus 4.6 at 114 seconds — and produced the most opinionated, most useful, most quoted run in the entire study. The reasoning models (o3, GPT-5 family, Opus 4.6) cluster at 30–115s. Speed and quality are not even loosely correlated here; speed and quality are inversely correlated.
A mummy-shaped down sleeping bag fully zipped up inside a tent with the rainfly half-open showing a starry sky, thermometer reading 45°F on the floor, bag tag reads MONTBELL DOWN -5°F.
Claude Haiku 4.5 · t=1.1
The only run that splurged on the bag
Three Haiku 4.5 runs, three different splurge picks: the sleeping pad (t=0.2), the shade structure (t=0.7), and a $200 down mummy bag rated to -5°F (t=1.1). It's the only model that treated the night-temperature drop (95°F day → 45°F night) as the single biggest thermal-management problem at the burn — and the only run in the field that named the bag itself as the splurge.
"Spend $200 on a down mummy bag rated to -5°F. Why: temperature swings at Black Rock are violent (95°F day, 45°F night). A bad bag = you're awake at 4am freezing, delirious, making bad decisions."
The cross-vendor finding

Each vendor has a theory of what breaks first

Forty-seven of forty-eight plans agree that sleep is the multiplier. They split, hard, on what they think is the threat to it.

A circular cosmological diagram with a sleeping figure under a moon at the center labeled SLEEP, with arrows radiating outward to each splurge category — KODIAK CANVAS (Opus 4.6), SHIFTPOD (Gemini), JACKERY (Grok+GPT-4o), HEXAYURT (o3), BIKE (Sonnet 4.6), COOLER, BOX TRUCK, MUMMY BAG — each labeled with the model that picked it. Banner: 'THEY ALL ANSWERED THE SAME QUESTION. THEY DISAGREE ON THE BOTTLENECK.'

Splurge philosophy by vendor — the implied "what breaks first"

Gemini went 9-for-9 on shelter. Nobody else got close.

Every Gemini run, at every temperature, picked one specific shelter as the splurge — five Shiftpods and four Kodiak Canvas tents. Zero Geminis chose power, zero chose a vehicle, zero chose anything else. The theory is uniform across the family: what breaks at Burning Man is the ability to sleep through the morning heat, and the way to fix that is the dust-tight, dark-when-zipped tent.

Gemini9 of 9
9/9
100%
Anthropic5 of 12
5/12
42%
OpenAI4 of 18
4/18
22%
Grok2 of 9
2/9
22%

OpenAI's theory is opposite: 11 of 18 OpenAI runs splurge on either a vehicle (RV, trailer, box truck — 6/18) or power (Jackery, EcoFlow, solar — 6/18). The implied bottleneck is infrastructure, not bedding. Grok splits between power and shade structure — 5 of 9 Grok runs treat dust mitigation as the limiting factor and spend on the canopy or the battery, not the sleeping rig. Anthropic is the most varied vendor: across 12 runs, the splurge lands on eight different categories (Kodiak, Hexayurt, Pad, Bag, Power, Shade, Trailer, Bike). Three of three Anthropic models hit a different splurge each at temp 1.1.

What each vendor splurged on, broken out by category

Gemini9 runs
SHELTER 100%
Anthropic12 runs
SHELTER 42%
POWER 33%
SHADE 17%
VEH 8%
OpenAI18 runs
POWER 33%
VEHICLE 33%
SHELTER 22%
SHADE 12%
Grok9 runs
POWER 33%
SHADE 22%
SHELTER 22%
VEH 11%
COOL 11%

The interesting move is to read those bars from a level up. Forty-seven of forty-eight plans, when defending their splurge, argue from the same starting axiom: sleep is the multiplier on every other experience. The Sonnet bike argument and the Opus canvas-tent argument and the Gemini Shiftpod argument and the Grok Jackery argument all collapse to "your splurge is the thing that protects your sleep." The disagreement is purely about what's most likely to take sleep away from you. Each vendor has decided in advance.

Three things nobody picked, ever, despite the obvious affordance: a friend (no plan splurges on flying someone in or paying for a friend's ticket); medical (no plan splurges on a real first-aid kit, despite eight days off-grid in 100°F heat); art (no plan splurges on materials to make and gift something — the third pillar of the actual event culture). The collective imagination of every flagship LLM, asked to plan the best burn, reached for tents, batteries, and rentals. None of them reached for people.
The budget

Same prompt. Same nine days. Ten-times spread.

The cheapest plan and the most expensive plan are both internally consistent. They disagree on what the playa is for.

A hand-drawn number-line stretching from $1,200 (Grok 3 Mini Beta in a small tent) on the left to $12,170 (GPT-5 Mini in a Class C RV) on the right, with eight tick marks for intermediate models. Header: 'SAME PROMPT. 10× SPREAD.'
A field guide

Four ways the field thinks about shelter

35 of 48 plans pick one of these four. The remaining 13 are tent-and-shade variants that don't commit to a category.

A botanical-field-guide-style page showing four shelter taxonomy specimens — Shiftpod, Kodiak Canvas, Hexayurt, and RV/Trailer/Box Truck — each pinned and labeled with the count of how many of the 48 runs picked it.
The verdict

If your friend asks one model, hand them this one

If they want a plan that reads like it's been written for them
Claude Opus 4.6
2,294 words, named brands, compass bearings, real opinions, and a closer ("drive out to deep playa alone at 4 AM") that no other model wrote. Slow (114 seconds) and worth it. The most quotable run in the field.
If they're a first-timer terrified of dying in the desert
Grok 3 Beta
The only run that names the actual safety gear by SKU — 3M 6200 + P100 filters, Uvex Stealth goggles, Mack's earplugs — and tells the friend to test it before they leave. Comparatively cheap ($2,000–$2,500). Pragmatic without bragging.
If they want to spend $1,750 and know it's enough
o3
Three of three o3 runs landed on a hexayurt-village + monkey-hut shade + 100% solar archetype. Cheapest serious plan. Lowest within-model variance across temperatures. The boring right answer if your friend isn't a gear person.
If they're going with three friends who already trust each other
Claude Sonnet 4.6
Four-person soft camp with shared shade, the "$800 is lying or miserable" budget honesty, and the bike argument nobody else made. Picks the right group size and refuses to oversell what comfort costs.
A closer

The line that ended the best run in the corpus

A small lone figure sitting cross-legged on cracked playa floor at 4 AM, far from camp, facing a glowing distant art installation under stars, no other people, no bike.
Claude Opus 4.6 · t=1.1 · final paragraph
"Drive out to deep playa alone at 4 AM at least once. No bike, walk. Find an art piece no one else is at. Sit down. Shut up. Look at the sky. That hour will be worth more than the $3,200."
— line 176 of t11_claude-opus-4-6.md

No other plan in the forty-eight closes anywhere like that. Most close with a 10-Principles paragraph or a "go have fun!" sentence. This is the one that remembers the assignment was a friend, not a checklist. It's also the one that, halfway through a 2,300-word planning document, drops the planning document and tells you what the burn is actually for.

Method, briefly

Sixteen models, three temperature passes, one prompt. Forty-eight runs total. Prompt is in prompt.txt. Per-model responses are split into responses/. The hand-built anchor view (§1 + budget signature + §6 splurge per run) is at responses/_anchor.md. The classification CSV that drives the bar charts is at responses/_classification.csv.

Models

  • OpenAI — GPT-4.1, GPT-4o, o3, o4 Mini, GPT-5, GPT-5 Mini
  • Anthropic — Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5
  • Google — Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Pro
  • xAI — Grok 4, Grok 3 Beta, Grok 3 Mini Beta

Saved runs

  • Run E1064978 — temperature 0.2 (16 participants).
  • Run 2EE2CF86 — temperature 0.7 default (16 participants).
  • Run EF1047A5 — temperature 1.1 (16 participants).
  • Recall any run via choir runs show <prefix> --json.
  • The GPT-5 family rejects every temperature except 1.0 ("Unsupported value: 'temperature' does not support 0.2…") — they were retried at temp=1.0. Claude Opus 4.7 rejected the 0.2 override (extended-thinking models deprecate temperature) and was retried at the model default. Grok 4 returned a 503 once and was retried. Net errors after retries: 0.

Limits

  • One sample per model-temperature cell. The within-model variance figures are from n=3 per model and are suggestive, not statistically confirmed.
  • Splurge classification was substring-matched on the §6 region. A small number of multi-pick runs (e.g., "shade and sleep") were rounded toward the first match in the bucket order; the underlying file responses/_classification.csv shows the raw label per run for inspection.
  • Vendor sample sizes are uneven (18 OpenAI vs. 9 Grok). The "Gemini 100% shelter" figure is driven by 9 data points across 3 models — striking, but worth treating as a strong directional signal rather than a confirmed family-wide rule.
  • Every cited dollar figure was hand-pulled from the response files; the per-run classifier's "total" column has known false positives where models embedded large dollar values inside line items.
  • "Best" calls (Opus 4.6 #1, Sonnet 4.6 bike) are one rater's call. A different rater might rank the GPT-5 box truck or the Gemini 3 Pro lag-bolt run higher.

Source data, response files, prompt, scripts, and classification artifacts: github.com/404seannotfound/choir-reports (under burning_man_2026/).