Books · Generated end-to-end with LLMs and image/audio models

This Is Why We Can't Have Nice Things

A children's fable. An old woman tells the children of her town the story of the last wagon — and the day she, as a small girl certain she'd watched enough, skipped the steps that kept it whole. 9 chapters, 73 illustrations, 38-minute audiobook with 11 narrator voices, all generated from a four-sentence brief.

Read the book → 🎧 Download audiobook (35 MB)

The numbers

A four-sentence brief became a 38-minute illustrated audiobook. Here's the count of everything that got generated along the way.

6,707

words

chapters

108

images rendered

final illustrations

38:22

audiobook duration

distinct voices

296

TTS segments

~86

LLM calls

Cost breakdown

Two ways to read this. Out-of-pocket is what actually hit a credit card. At-full-rates is what a fresh account with no credits or subscriptions would have paid for the same work.

~$5

Out-of-pocket

~$42

At full provider rates

Bucket	Volume	Out-of-pocket	At-full-rates
OpenAI image generation — `gpt-image-1` cover + 7 portraits + 8 style swatches + 73 chapter beats + 5 cover candidates	94 images	$0 credit grant	~$8
Claude Code orchestration — `claude-opus-4-6` the agent loop that runs the pipeline. Heavy prompt caching brings effective rate to ~$3/M.	7.7M tokens	$0 Claude Pro/Code subscription	~$23
ElevenLabs audiobook — `eleven_multilingual_v2` 296 voice-tagged TTS segments across 11 voices	19k credits	$0 within monthly 30k credit subscription	~$3.45
Choir LLM calls — `claude-opus-4-7` + `grok-4` + `gemini-3-pro-preview` arc fan-out, judging, chapter weave, beat ID, audiobook segmentation	~60 calls · ~400k tokens	~$5	~$6
Image generation (other providers) — `grok-imagine` + `imagen-4` comic, print, and decorative style swatches	14 images	~$1	~$1
Total this book		~$5	~$42

What absorbed the cost. OpenAI gives new accounts standing credit grants ($30 every year, $50–60 to startup accounts), and image generation is the only place this book leans on them heavily. ElevenLabs Creator-tier subscription includes 30,000 character credits per month; the audiobook used 19k. Claude Code is a flat monthly subscription that covers all the agent's planning, file editing, and tool calls — the 7.7M tokens above never show up as a per-token charge. The only line truly metered against my balance is the choir fan-out / weave / judge work, which split across this book, two other books I made in the same window, and 22 choir-reports comparison studies came to about $30 total across everything.

What a fresh account would pay. ~$42 is the at-full-rates number — what you'd spend if you opened brand-new accounts at every provider with no credits and no monthly subscriptions, and rebuilt this exact book. That's the technology's real price; the $5 number is just how good a deal credit grants and subscriptions are.

Timing

Wall clock time for each stage. Everything that could parallelize did. Total time from "go" to finished audiobook: roughly half an hour.

Stage	What runs	Wall clock
0 · Interview	Capture the seed brief	~10 s
1+2 · Arcs + judge	18 parallel arc generations, then 12 parallel judges	~4 min
3 · Spread	20 parallel timeline / characters / factors / cover calls + 5 cover renders	~2 min
4 · Style gallery	22 style swatches across 11 vibrant styles, batched in 12s	~2 min
5 · Preview site	Static HTML generation (10 pages)	~30 s
6 · Lock-in	1 final cover + 7 portraits in chosen style, parallel	~90 s
7 · Chapter weave	9 chapters × 2 models = 18 parallel weaves	~2 min
8 · Beat illustration	9 parallel beat-ID calls, then 73 illustrations batched in 12s	~5 min
9 · Compile	novel.html assembly + Chrome-headless PDF	~3 min
10 · Audiobook	9 segmentations + 296 ElevenLabs TTS calls + ffmpeg stitch	~3 min
Total (everything end to end)		~25 – 35 min

The process

Ten stages, each a separate slash command. The early ones generate a menu; the late ones commit to picks and bake the cake.

0 · Capture the seed (/scriptorium-interview)

A short conversation captures the inspirer's intent in brief.md: working title, concept, voice references (Jon Klassen, William Steig, Mo Willems), POV, tone, length, what to avoid, visual aesthetic preferences. Every downstream prompt reads this brief.

1 + 2 · Wide arc fan-out + judging (/scriptorium-arcs)

Six structural shapes (tragic, comic, mystery, picaresque, ironic, wildcard) × three premium models (Opus, GPT-5, Grok) = up to 18 candidate arcs. Then two independent judges (Opus + Gemini) score each on inventiveness, coherence, stakes, illustrability, and voice match. The judges name each candidate with a two-word evocative name (Porch Rituals, Threadbare Cloth, Silent Axle, Apple Hill, Quiet Porch) and rank them.

3 · Spread the survivors (/scriptorium-spread)

For each of the top 5 arcs: a chapter-by-chapter timeline, named subplots, named pivotal events, themes / motifs ("factors"), a full character roster with portrait briefs, and a cover-art subject. All five arcs spread in parallel.

4 · Visual style gallery (/scriptorium-styles)

22 sample illustrations across 11 named styles in 5 families — comics, kid-book, print, decorative, and project-specific. Each style is rendered twice using the same representative scene so the comparison is purely about style. The catalog is filtered by the brief: here, no watercolor or sketch families.

5 · Preview website (/scriptorium-preview)

A browsable static site assembles the 5 arcs (with gantt charts of their subplots), the style gallery, and a six-step lock-in wizard with localStorage state. The inspirer reviews and picks.

6 · Lock-in (/scriptorium-lock)

The inspirer commits to: one spine arc, optional named elements mixed in from other arcs, one visual style, the cover, and final character roster. Final cover and character portraits get re-rendered in the chosen style. For this book: Porch Rituals as the spine, with mix-ins from all four other arcs — the colored cloths (from Threadbare Cloth), the "scattering its bright bones" image (Apple Hill), the basket-of-pieces resolution (Silent Axle), and the "Mara died last winter" line (Quiet Porch). Style: Wagon Folk (project-specific folk-art on painted wood).

7 · Weave the chapters (/scriptorium-chapter N)

Each chapter gets fanned out across two models (Opus + Grok), judged for narrative momentum / voice / continuity / prose / scene shape, and the winner promoted. All 9 chapters can be woven in parallel because each chapter prompt embeds the full timeline, cast, arc, and mix-ins.

8 · Illustrate the beats (/scriptorium-illustrate N)

For each chapter, the illustration director identifies 6+ distinct narrative beats — opening, early, midchapter, turning, climax, closing — each with a specific scene subject and hand-lettered annotations. Each beat is rendered once in the locked style. Beats are interleaved with the prose in the chapter page, graphic-novel-meets-prose.

9 · Compile the novel (/scriptorium-compile)

Cover + table of contents + 9 chapters with beats inline + back matter, all in one parchment-toned HTML. A Chrome-headless print pass produces the PDF.

10 · Audiobook (this session)

Each chapter is segmented by Claude into voice-tagged dialogue / narration pairs. ElevenLabs multilingual_v2 renders 296 audio segments using 11 premade voices (George for the narrator, Bill for grandpa, Jessica for young Pip, etc.). ffmpeg concat stitches segments per chapter with 0.4 s pauses, and joins chapters with 1.5 s pauses.

The tooling

Every step is a shell-out, not a library binding — the pipeline is a stack of bash scripts and Python helpers. Sibling of choir.

Fan-out + judging

choir

Sean's CLI for routing prompts to any model across providers. Single-model: plain text out. Multi-model: JSON. --save persists a comparison run; choir runs compare appends a judge summary later.

Anthropic Opus 4.7 · OpenAI GPT-5 · xAI Grok 4 · Google Gemini 3 Pro

Text generation

Claude Opus 4.7

Primary writer. Won every chapter judge pass against Grok 4. Used everywhere story-craft matters — arcs, spread, chapter weave, beat identification, audiobook segmentation.

~$15 / 1M input · $75 / 1M output

Text generation (alt)

Grok 4 · Gemini 3 Pro

Used for diversity in fan-outs (Grok) and as a second judge (Gemini), since GPT-5 was unavailable in this run. Grok arcs scored consistently below Opus's; the judge quality from Gemini matched Opus's.

~$3 / 1M (Grok) · ~$1.25 / 1M (Gemini)

Image generation (primary)

OpenAI gpt-image-1

All 73 chapter beats and all 7 character portraits. Cleanest rendering of "folk-art painted wood with stenciled borders" of the three providers. Outputs PNG at 1024×1024.

~$0.04 – $0.17 per image depending on quality tier

Image generation (style variety)

xAI grok-imagine · Google Imagen 4

Used for specific style families in the gallery — Grok for newspaper-comic and crayon-wax; Imagen for stained glass, linocut, and lantern-ink. Picked per provider based on which one renders that tradition best.

~$0.07 (Grok) · ~$0.04 (Imagen) per image

Text-to-speech

ElevenLabs multilingual_v2

11 voices from the premade catalog: George (warm British storyteller, tagged for narrative_story) as narrator; Bill, Lily, Jessica, Sarah, Charlie, Laura, Will, Liam, Matilda, Roger for the cast. Voice settings: stability 0.55, similarity_boost 0.78.

~$0.18 – $0.30 per 1,000 characters

Stitching

ffmpeg

Per-chapter concat list with mp3 segments and a 0.4 s anullsrc silence between segments. Then chapters concat with 1.5 s silences. Bitrate normalized to 128 kbps mono.

Free

PDF

Chrome --headless --print-to-pdf

Print CSS hides the audio players and PDF nav bar; chapter pages page-break-before, illustrations page-break-inside avoid. Output is 213 MB because the 73 illustrations are full-resolution PNG.

Free

Read it / hear it

Read the illustrated novel → 🎧 Download audiobook (35 MB · 38 min)

Per-chapter mp3s are inside the novel page — each chapter has its own <audio> player above the text. The chapter audio files are also in /books/cant-have-nice-things/audio/ if you want to drop them into a podcast app or pull them down individually.