Orchestration part 2: Show, don't tell: watching one request fall through the orchestration stack

Everyone wants to argue about which model is best. It's the wrong argument. The interesting question isn't "which model" — it's "what happens at the seams between them." So instead of telling you how the three-tier stack works, I'm going to show you. One real request, followed the whole way down.

Post 2 in the orchestration follow-up series. [Post 1 — "Fable talks to machines better than to people"] set up the idea. This one makes it concrete.

Here's the stack I'll be talking about:

Opus Max sits at the top as the architect — it turns a fuzzy human intent into a plan. Opus is the orchestrator — it takes that plan and breaks it into independent pieces of work. Sonnet is the worker — it does the pieces. Three tiers, and three hand-offs between them. The hand-offs are where everything either holds together or quietly falls apart, so they're what I'll spend most of this post on.

A quick note on naming: this series started life around Fable. Fable's unavailable right now, so Opus Max is doing the architect job instead. The shape of the argument doesn't change — if anything, Opus Max makes the "expensive thinking at the top, cheap execution at the bottom" point even cleaner.

The request

I'll use something small and real, because small and real is where the seams actually show. The request:

"Add CSV export to the reports page. Same columns the table shows, respect the active filters, and don't let it time out on big accounts."

That's it. That's what a human says. Notice everything it doesn't say: which endpoint, streaming or buffered, how to handle a 200k-row account, what the filename should be, whether to paginate. A junior dev would start coding and ask three questions later. A worker model handed this cold would guess — and guess wrong on the timeout, every time.

So it doesn't go straight to a worker. It goes to the architect first.

Tier 1 — the architect turns intent into a plan

Opus Max's job is not to write code. Its job is to remove ambiguity. Given the request above, the plan it produces looks like this:

Goal: stream a CSV of the current report, filters applied, safe for large accounts.
Constraints: must not buffer the whole result set in memory; respect existing filter query params; reuse the table's column definitions so export and view never drift.
Approach: server-side streaming endpoint, cursor-paginated query, column list imported from the shared table config.
Definition of done: exports match the on-screen table for a filtered view; a 200k-row account completes without a timeout or memory spike; filename includes the report name and date.

That's the artifact that crosses the first seam. Not "go build CSV export" — a plan with the timeout decision already made and the drift trap already closed. The expensive thinking happens once, here, at the top.

Tier 2 — the orchestrator decomposes the plan

Now Opus takes that plan and does something different: it breaks it into pieces that can each be done in isolation. The test for a good decomposition is brutal — could a worker who can't see any of the other tasks still get this one right? If not, the task isn't scoped well enough yet.

For this feature the decomposition came out as four leaf tasks:

Streaming endpoint — GET /reports/:id/export.csv, cursor-paginated, no full-buffer. Contract: accepts the same filter params as the report view; emits rows in chunks.
Shared column config — extract the table's column definitions into a module both the view and the export import. Contract: one source of truth for columns and order.
Filter pass-through — map the active UI filters onto the export request. Contract: a filtered view and its export return identical row sets.
Front-end trigger — the export button, filename, and progress/error states. Contract: filename is {report}-{YYYY-MM-DD}.csv; shows a failure state if the stream errors.

Each of those carries its own little contract. That's deliberate — the contract is what lets the next tier run without phoning home.

Tier 3 — the workers execute

Each leaf task goes to a Sonnet worker. Worker 3, for example, gets only what it needs: the filter param shape, the acceptance test ("filtered view and export return identical row sets"), and the column module's interface. It doesn't need to know how the streaming endpoint is implemented — only the contract it has to satisfy. It writes the mapping, writes a test against the acceptance criterion, and hands back an output plus a self-check.

Four workers, four outputs, reassembled against the plan. The feature ships.

The hand-off is the product

Here's the part I actually care about. When this works, it isn't because Sonnet is a great coder or Opus Max is a great planner. It's because each hand-off carried the right things across the seam — and left the wrong things behind.

Four things travel across a good hand-off, and a fourth that's about restraint:

Context — enough to run alone, and no more. Flood a worker with the whole repo and it gets worse, not better.
Constraints — the guardrails. "Don't buffer the whole result set" is worth more than three paragraphs of explanation.
Acceptance criteria — the definition of done, written so the worker can check its own work.
What's withheld — the implementation details of other tasks. Withholding is a feature. It's what keeps tasks independent and stops one worker's assumptions leaking into another's.

Get those right and the tiers compose. Get them wrong — usually by over-sharing context or under-specifying done — and you get plausible output that fails at the seam. The model was never the problem.

Where quality actually lives

Look at the return path, not just the send. Every worker hands back its output and a self-check against its acceptance criterion. That's not decoration — it's where reliability comes from when you're moving this fast.

This is the thread I'll keep pulling on through the series, and it's the heart of the book chapter I'm working toward — "Quality at Machine Speed." When a stack can produce work faster than any human can review it, the definition of done has to travel with the task, and the check has to happen at the seam, automatically, every time. More on that in a couple of posts.

What changed, honestly

Two things got better. Cost: the expensive model thinks once, at the top, and cheap workers do the volume — I'll put real numbers on that next week. Reliability: the timeout decision got made by the tier that should make it, not guessed by a worker mid-stream.

And one thing didn't change: a human still has to state the intent and own the definition of done. The architect can remove ambiguity, but it can't decide what's worth building. That's still the top of the stack, and it's still us.

Next week (Post 3, sharp): "Why I took the architect tier out of my subscription on purpose" — pricing as a forcing function, and the overengineering trap.

If you're testing Fable / Opus Max too, where did it land for you? Drop it in the comments — I'm collecting the failure modes as much as the wins.

Orchestration part 2: Show, don't tell: watching one request fall through the orchestration stack

The request

Tier 1 — the architect turns intent into a plan

Tier 2 — the orchestrator decomposes the plan

Tier 3 — the workers execute

The hand-off is the product

Where quality actually lives

What changed, honestly

Read next

Quality at machine speed: ~70,000 tests, static analysis, and the review bias nobody warns you about

Where the three-tier stack was overkill: an honest retrospective

RPI → RPIQ: the method that made AI build production software