This Is Why We Can't Have Nice Things
A children's fable. An old woman tells the children of her town the story of the last wagon — and the day she, as a small girl certain she'd watched enough, skipped the steps that kept it whole. 9 chapters, 73 illustrations, 38-minute audiobook with 11 narrator voices, all generated from a four-sentence brief.
The numbers
A four-sentence brief became a 38-minute illustrated audiobook. Here's the count of everything that got generated along the way.
Cost breakdown
Two ways to read this. Out-of-pocket is what actually hit a credit card. At-full-rates is what a fresh account with no credits or subscriptions would have paid for the same work.
| Bucket | Volume | Out-of-pocket | At-full-rates |
|---|---|---|---|
OpenAI image generation — gpt-image-1cover + 7 portraits + 8 style swatches + 73 chapter beats + 5 cover candidates |
94 images | $0 credit grant |
~$8 |
Claude Code orchestration — claude-opus-4-6the agent loop that runs the pipeline. Heavy prompt caching brings effective rate to ~$3/M. |
7.7M tokens | $0 Claude Pro/Code subscription |
~$23 |
ElevenLabs audiobook — eleven_multilingual_v2296 voice-tagged TTS segments across 11 voices |
19k credits | $0 within monthly 30k credit subscription |
~$3.45 |
Choir LLM calls — claude-opus-4-7 + grok-4 + gemini-3-pro-previewarc fan-out, judging, chapter weave, beat ID, audiobook segmentation |
~60 calls · ~400k tokens | ~$5 | ~$6 |
Image generation (other providers) — grok-imagine + imagen-4comic, print, and decorative style swatches |
14 images | ~$1 | ~$1 |
| Total this book | ~$5 | ~$42 | |
What absorbed the cost. OpenAI gives new accounts standing credit grants ($30 every year, $50–60 to startup accounts), and image generation is the only place this book leans on them heavily. ElevenLabs Creator-tier subscription includes 30,000 character credits per month; the audiobook used 19k. Claude Code is a flat monthly subscription that covers all the agent's planning, file editing, and tool calls — the 7.7M tokens above never show up as a per-token charge. The only line truly metered against my balance is the choir fan-out / weave / judge work, which split across this book, two other books I made in the same window, and 22 choir-reports comparison studies came to about $30 total across everything.
What a fresh account would pay. ~$42 is the at-full-rates number — what you'd spend if you opened brand-new accounts at every provider with no credits and no monthly subscriptions, and rebuilt this exact book. That's the technology's real price; the $5 number is just how good a deal credit grants and subscriptions are.
Timing
Wall clock time for each stage. Everything that could parallelize did. Total time from "go" to finished audiobook: roughly half an hour.
| Stage | What runs | Wall clock |
|---|---|---|
| 0 · Interview | Capture the seed brief | ~10 s |
| 1+2 · Arcs + judge | 18 parallel arc generations, then 12 parallel judges | ~4 min |
| 3 · Spread | 20 parallel timeline / characters / factors / cover calls + 5 cover renders | ~2 min |
| 4 · Style gallery | 22 style swatches across 11 vibrant styles, batched in 12s | ~2 min |
| 5 · Preview site | Static HTML generation (10 pages) | ~30 s |
| 6 · Lock-in | 1 final cover + 7 portraits in chosen style, parallel | ~90 s |
| 7 · Chapter weave | 9 chapters × 2 models = 18 parallel weaves | ~2 min |
| 8 · Beat illustration | 9 parallel beat-ID calls, then 73 illustrations batched in 12s | ~5 min |
| 9 · Compile | novel.html assembly + Chrome-headless PDF | ~3 min |
| 10 · Audiobook | 9 segmentations + 296 ElevenLabs TTS calls + ffmpeg stitch | ~3 min |
| Total (everything end to end) | ~25 – 35 min | |
The process
Ten stages, each a separate slash command. The early ones generate a menu; the late ones commit to picks and bake the cake.
0 · Capture the seed (/scriptorium-interview)
brief.md: working title, concept, voice references (Jon Klassen, William Steig, Mo Willems), POV, tone, length, what to avoid, visual aesthetic preferences. Every downstream prompt reads this brief.1 + 2 · Wide arc fan-out + judging (/scriptorium-arcs)
3 · Spread the survivors (/scriptorium-spread)
4 · Visual style gallery (/scriptorium-styles)
5 · Preview website (/scriptorium-preview)
6 · Lock-in (/scriptorium-lock)
7 · Weave the chapters (/scriptorium-chapter N)
8 · Illustrate the beats (/scriptorium-illustrate N)
9 · Compile the novel (/scriptorium-compile)
10 · Audiobook (this session)
multilingual_v2 renders 296 audio segments using 11 premade voices (George for the narrator, Bill for grandpa, Jessica for young Pip, etc.). ffmpeg concat stitches segments per chapter with 0.4 s pauses, and joins chapters with 1.5 s pauses.The tooling
Every step is a shell-out, not a library binding — the pipeline is a stack of bash scripts and Python helpers. Sibling of choir.
choir
Sean's CLI for routing prompts to any model across providers. Single-model: plain text out. Multi-model: JSON. --save persists a comparison run; choir runs compare appends a judge summary later.
Anthropic Opus 4.7 · OpenAI GPT-5 · xAI Grok 4 · Google Gemini 3 Pro
Claude Opus 4.7
Primary writer. Won every chapter judge pass against Grok 4. Used everywhere story-craft matters — arcs, spread, chapter weave, beat identification, audiobook segmentation.
~$15 / 1M input · $75 / 1M output
Grok 4 · Gemini 3 Pro
Used for diversity in fan-outs (Grok) and as a second judge (Gemini), since GPT-5 was unavailable in this run. Grok arcs scored consistently below Opus's; the judge quality from Gemini matched Opus's.
~$3 / 1M (Grok) · ~$1.25 / 1M (Gemini)
OpenAI gpt-image-1
All 73 chapter beats and all 7 character portraits. Cleanest rendering of "folk-art painted wood with stenciled borders" of the three providers. Outputs PNG at 1024×1024.
~$0.04 – $0.17 per image depending on quality tier
xAI grok-imagine · Google Imagen 4
Used for specific style families in the gallery — Grok for newspaper-comic and crayon-wax; Imagen for stained glass, linocut, and lantern-ink. Picked per provider based on which one renders that tradition best.
~$0.07 (Grok) · ~$0.04 (Imagen) per image
ElevenLabs multilingual_v2
11 voices from the premade catalog: George (warm British storyteller, tagged for narrative_story) as narrator; Bill, Lily, Jessica, Sarah, Charlie, Laura, Will, Liam, Matilda, Roger for the cast. Voice settings: stability 0.55, similarity_boost 0.78.
~$0.18 – $0.30 per 1,000 characters
ffmpeg
Per-chapter concat list with mp3 segments and a 0.4 s anullsrc silence between segments. Then chapters concat with 1.5 s silences. Bitrate normalized to 128 kbps mono.
Free
Chrome --headless --print-to-pdf
Print CSS hides the audio players and PDF nav bar; chapter pages page-break-before, illustrations page-break-inside avoid. Output is 213 MB because the 73 illustrations are full-resolution PNG.
Free
Read it / hear it
Per-chapter mp3s are inside the novel page — each chapter has its own <audio> player above the text. The chapter audio files are also in /books/cant-have-nice-things/audio/ if you want to drop them into a podcast app or pull them down individually.