This Is Why We Can't Have Nice Things cover
Books · Generated end-to-end with LLMs and image/audio models

This Is Why We Can't Have Nice Things

A children's fable. An old woman tells the children of her town the story of the last wagon — and the day she, as a small girl certain she'd watched enough, skipped the steps that kept it whole. 9 chapters, 73 illustrations, 38-minute audiobook with 11 narrator voices, all generated from a four-sentence brief.

The numbers

A four-sentence brief became a 38-minute illustrated audiobook. Here's the count of everything that got generated along the way.

6,707
words
9
chapters
108
images rendered
73
final illustrations
38:22
audiobook duration
11
distinct voices
296
TTS segments
~86
LLM calls

Cost breakdown

Two ways to read this. Out-of-pocket is what actually hit a credit card. At-full-rates is what a fresh account with no credits or subscriptions would have paid for the same work.

~$5
Out-of-pocket
~$42
At full provider rates
BucketVolumeOut-of-pocketAt-full-rates
OpenAI image generationgpt-image-1
cover + 7 portraits + 8 style swatches + 73 chapter beats + 5 cover candidates
94 images $0
credit grant
~$8
Claude Code orchestrationclaude-opus-4-6
the agent loop that runs the pipeline. Heavy prompt caching brings effective rate to ~$3/M.
7.7M tokens $0
Claude Pro/Code subscription
~$23
ElevenLabs audiobookeleven_multilingual_v2
296 voice-tagged TTS segments across 11 voices
19k credits $0
within monthly 30k credit subscription
~$3.45
Choir LLM callsclaude-opus-4-7 + grok-4 + gemini-3-pro-preview
arc fan-out, judging, chapter weave, beat ID, audiobook segmentation
~60 calls · ~400k tokens ~$5 ~$6
Image generation (other providers)grok-imagine + imagen-4
comic, print, and decorative style swatches
14 images ~$1 ~$1
Total this book ~$5 ~$42

What absorbed the cost. OpenAI gives new accounts standing credit grants ($30 every year, $50–60 to startup accounts), and image generation is the only place this book leans on them heavily. ElevenLabs Creator-tier subscription includes 30,000 character credits per month; the audiobook used 19k. Claude Code is a flat monthly subscription that covers all the agent's planning, file editing, and tool calls — the 7.7M tokens above never show up as a per-token charge. The only line truly metered against my balance is the choir fan-out / weave / judge work, which split across this book, two other books I made in the same window, and 22 choir-reports comparison studies came to about $30 total across everything.

What a fresh account would pay. ~$42 is the at-full-rates number — what you'd spend if you opened brand-new accounts at every provider with no credits and no monthly subscriptions, and rebuilt this exact book. That's the technology's real price; the $5 number is just how good a deal credit grants and subscriptions are.

Timing

Wall clock time for each stage. Everything that could parallelize did. Total time from "go" to finished audiobook: roughly half an hour.

StageWhat runsWall clock
0 · InterviewCapture the seed brief~10 s
1+2 · Arcs + judge18 parallel arc generations, then 12 parallel judges~4 min
3 · Spread20 parallel timeline / characters / factors / cover calls + 5 cover renders~2 min
4 · Style gallery22 style swatches across 11 vibrant styles, batched in 12s~2 min
5 · Preview siteStatic HTML generation (10 pages)~30 s
6 · Lock-in1 final cover + 7 portraits in chosen style, parallel~90 s
7 · Chapter weave9 chapters × 2 models = 18 parallel weaves~2 min
8 · Beat illustration9 parallel beat-ID calls, then 73 illustrations batched in 12s~5 min
9 · Compilenovel.html assembly + Chrome-headless PDF~3 min
10 · Audiobook9 segmentations + 296 ElevenLabs TTS calls + ffmpeg stitch~3 min
Total (everything end to end)~25 – 35 min

The process

Ten stages, each a separate slash command. The early ones generate a menu; the late ones commit to picks and bake the cake.

0 · Capture the seed (/scriptorium-interview)
A short conversation captures the inspirer's intent in brief.md: working title, concept, voice references (Jon Klassen, William Steig, Mo Willems), POV, tone, length, what to avoid, visual aesthetic preferences. Every downstream prompt reads this brief.
1 + 2 · Wide arc fan-out + judging (/scriptorium-arcs)
Six structural shapes (tragic, comic, mystery, picaresque, ironic, wildcard) × three premium models (Opus, GPT-5, Grok) = up to 18 candidate arcs. Then two independent judges (Opus + Gemini) score each on inventiveness, coherence, stakes, illustrability, and voice match. The judges name each candidate with a two-word evocative name (Porch Rituals, Threadbare Cloth, Silent Axle, Apple Hill, Quiet Porch) and rank them.
3 · Spread the survivors (/scriptorium-spread)
For each of the top 5 arcs: a chapter-by-chapter timeline, named subplots, named pivotal events, themes / motifs ("factors"), a full character roster with portrait briefs, and a cover-art subject. All five arcs spread in parallel.
4 · Visual style gallery (/scriptorium-styles)
22 sample illustrations across 11 named styles in 5 families — comics, kid-book, print, decorative, and project-specific. Each style is rendered twice using the same representative scene so the comparison is purely about style. The catalog is filtered by the brief: here, no watercolor or sketch families.
5 · Preview website (/scriptorium-preview)
A browsable static site assembles the 5 arcs (with gantt charts of their subplots), the style gallery, and a six-step lock-in wizard with localStorage state. The inspirer reviews and picks.
6 · Lock-in (/scriptorium-lock)
The inspirer commits to: one spine arc, optional named elements mixed in from other arcs, one visual style, the cover, and final character roster. Final cover and character portraits get re-rendered in the chosen style. For this book: Porch Rituals as the spine, with mix-ins from all four other arcs — the colored cloths (from Threadbare Cloth), the "scattering its bright bones" image (Apple Hill), the basket-of-pieces resolution (Silent Axle), and the "Mara died last winter" line (Quiet Porch). Style: Wagon Folk (project-specific folk-art on painted wood).
7 · Weave the chapters (/scriptorium-chapter N)
Each chapter gets fanned out across two models (Opus + Grok), judged for narrative momentum / voice / continuity / prose / scene shape, and the winner promoted. All 9 chapters can be woven in parallel because each chapter prompt embeds the full timeline, cast, arc, and mix-ins.
8 · Illustrate the beats (/scriptorium-illustrate N)
For each chapter, the illustration director identifies 6+ distinct narrative beats — opening, early, midchapter, turning, climax, closing — each with a specific scene subject and hand-lettered annotations. Each beat is rendered once in the locked style. Beats are interleaved with the prose in the chapter page, graphic-novel-meets-prose.
9 · Compile the novel (/scriptorium-compile)
Cover + table of contents + 9 chapters with beats inline + back matter, all in one parchment-toned HTML. A Chrome-headless print pass produces the PDF.
10 · Audiobook (this session)
Each chapter is segmented by Claude into voice-tagged dialogue / narration pairs. ElevenLabs multilingual_v2 renders 296 audio segments using 11 premade voices (George for the narrator, Bill for grandpa, Jessica for young Pip, etc.). ffmpeg concat stitches segments per chapter with 0.4 s pauses, and joins chapters with 1.5 s pauses.

The tooling

Every step is a shell-out, not a library binding — the pipeline is a stack of bash scripts and Python helpers. Sibling of choir.

Fan-out + judging

choir

Sean's CLI for routing prompts to any model across providers. Single-model: plain text out. Multi-model: JSON. --save persists a comparison run; choir runs compare appends a judge summary later.

Anthropic Opus 4.7 · OpenAI GPT-5 · xAI Grok 4 · Google Gemini 3 Pro

Text generation

Claude Opus 4.7

Primary writer. Won every chapter judge pass against Grok 4. Used everywhere story-craft matters — arcs, spread, chapter weave, beat identification, audiobook segmentation.

~$15 / 1M input · $75 / 1M output

Text generation (alt)

Grok 4 · Gemini 3 Pro

Used for diversity in fan-outs (Grok) and as a second judge (Gemini), since GPT-5 was unavailable in this run. Grok arcs scored consistently below Opus's; the judge quality from Gemini matched Opus's.

~$3 / 1M (Grok) · ~$1.25 / 1M (Gemini)

Image generation (primary)

OpenAI gpt-image-1

All 73 chapter beats and all 7 character portraits. Cleanest rendering of "folk-art painted wood with stenciled borders" of the three providers. Outputs PNG at 1024×1024.

~$0.04 – $0.17 per image depending on quality tier

Image generation (style variety)

xAI grok-imagine · Google Imagen 4

Used for specific style families in the gallery — Grok for newspaper-comic and crayon-wax; Imagen for stained glass, linocut, and lantern-ink. Picked per provider based on which one renders that tradition best.

~$0.07 (Grok) · ~$0.04 (Imagen) per image

Text-to-speech

ElevenLabs multilingual_v2

11 voices from the premade catalog: George (warm British storyteller, tagged for narrative_story) as narrator; Bill, Lily, Jessica, Sarah, Charlie, Laura, Will, Liam, Matilda, Roger for the cast. Voice settings: stability 0.55, similarity_boost 0.78.

~$0.18 – $0.30 per 1,000 characters

Stitching

ffmpeg

Per-chapter concat list with mp3 segments and a 0.4 s anullsrc silence between segments. Then chapters concat with 1.5 s silences. Bitrate normalized to 128 kbps mono.

Free

PDF

Chrome --headless --print-to-pdf

Print CSS hides the audio players and PDF nav bar; chapter pages page-break-before, illustrations page-break-inside avoid. Output is 213 MB because the 73 illustrations are full-resolution PNG.

Free

Read it / hear it

Per-chapter mp3s are inside the novel page — each chapter has its own <audio> player above the text. The chapter audio files are also in /books/cant-have-nice-things/audio/ if you want to drop them into a podcast app or pull them down individually.