Scriptorium
A pipeline for building illustrated novels end-to-end. Brief in — story, art, layout, PDF, and audiobook out. The same multi-model fan-out and judging pattern as Choir, applied to making books.
Scriptorium takes a four-sentence brief and produces a finished illustrated novel: a readable HTML edition with art interleaved through the prose, a printable PDF, and a narrated audiobook with distinct voices for each character. The pipeline is ten short stages, each one a slash command. Most stages fan out across multiple premium models, run an internal judge to pick winners, and surface named elements (two-word evocative names like "Porch Rituals", "Threadbare Cloth") so the inspirer can mix and match across candidates instead of choosing between numbered options.
Everything stays on your Mac. Pipeline reads brief from disk, calls models via Choir + image-gen APIs + ElevenLabs, writes outputs to a per-book folder. No Scriptorium server, no upload, no cloud step you didn't approve.
Books Made with Scriptorium
Four illustrated books produced end-to-end with this pipeline — a children's fable, a middle-grade mystery, a short cooking tutorial, and a Mézières-watercolor agentic-AI setup tutorial.
How It Works — 10 Stages
Each stage is a slash command that consumes the previous stage's outputs and produces named, structured artifacts on disk. Most stages fan out across multiple models and run an internal judge to pick winners. The first six produce a menu; the last four bake the cake.
Interview
A short conversation captures the inspirer's intent. Title, concept, voice references, POV, tone, themes, boundaries, visual aesthetic. Captured in brief.md.
Arc fan-out + judging
Six structural shapes (tragic, comic, mystery, picaresque, ironic, wildcard) × three premium models = up to 18 candidate arcs. Two judges score each on inventiveness, coherence, stakes, illustrability, voice match. Top 5 named survivors.
Spread
Each survivor gets a chapter-by-chapter timeline, named characters, named subplots, named pivotal events, themes, and a cover subject. All five spread in parallel.
Style gallery
24 named visual styles in 6 families. Filtered by the brief (e.g. "no watercolor, no sketches" prunes 8 styles). Two sample illustrations per surviving style. The inspirer picks one in Stage 6.
Preview site
A browsable static site assembles the 5 arcs with gantt charts, the style gallery, and a six-step lock-in wizard. The inspirer reviews and decides.
Lock-in
Commit to one spine arc, optional named elements mixed in from other arcs, one visual style, the cover, and the character roster. Cover and portraits get re-rendered in the chosen style.
Chapter weave
Each chapter is fanned out across two models, judged for momentum / voice / continuity / prose / scene shape. Winner promoted to chosen.md.
Illustrate the beats
For each chapter, identify 6+ distinct narrative beats at six page-positions (opening / early / midchapter / turning / climax / closing). Each beat is rendered once in the locked style, interleaved with the prose.
Compile
Cover + table of contents + chapters with beats inline + back matter, in one parchment-toned HTML. Chrome-headless print pass produces the PDF.
Audiobook
Each chapter is segmented by Claude into voice-tagged dialogue / narration pairs. ElevenLabs renders each segment with the per-character voice. ffmpeg stitches segments with pauses.
The Lock-in Site
Stage 5 builds a browsable static site the inspirer reviews before committing to a spine arc, mix-in elements, visual style, cover, and cast. The pages are real artifacts from This Is Why We Can't Have Nice Things.
localStorage; the final summary appears only after step 6.Why It Works
Fan-out beats single-shot
A single Opus call gives you a good arc. Six structural shapes × three models × two judges gives you the best of 18, named and ranked. Same logic applies at every stage of the pipeline.
Names carry across stages
Every subplot, event, factor, and character gets a two-word evocative name ("Iron Letters", "Porch Rituals"). Named candidates are easy to discuss; numbered ones aren't. The chapter-weave prompt mentions mix-ins by name. The lineage doc reads like a recipe.
Beats, not banner art
Six or more distinct narrative moments per chapter, each rendered once and interleaved with the prose at proportional positions. The book reads as a graphic-novel-meets-prose hybrid, not a novel with a chapter heading image.
Inspirer-led picks
The pipeline produces a menu of named options at every decision point. The inspirer picks via a wizard with localStorage state. No model "decides" the arc, the cover, or the cast.
Style is reflexive
The visual style envelope wraps every chapter SUBJECT. For a book about a wagon, the style was folk-art painting on a wagon panel — the visual language echoes the narrative form.
Parallel where possible
Within each stage, work that can parallelize does. 73 illustrations don't render serially. 296 TTS segments don't render serially. End-to-end wall clock for a 9-chapter book: ~25–35 minutes.
What's Under the Hood
Claude Opus 4.7
Primary writer. Arcs, chapter weave, beat identification, audiobook segmentation.
xAI Grok 4 · Google Gemini 3 Pro
Diversity in fan-outs and second-judge panel. Gemini judging matches Opus quality at a fraction of the cost.
OpenAI gpt-image-1
Primary illustrator. Renders 1024×1024 PNG. Cleanest output for folk-art / gouache / kid-book traditions.
xAI grok-imagine · Imagen 4
Style variety. Grok for newspaper-comic and crayon-wax; Imagen for stained glass, linocut, lantern-ink.
ElevenLabs multilingual_v2
TTS. 11 premade voices per book, mapped per character. Renders in seconds per segment.
Choir
The CLI that routes prompts to any model. Scriptorium shells out to choir ask at every LLM step.