Chapter 6 — Fundamental Orchestration Strategies

Sixth post of the chapter-by-chapter walkthrough of LLM Primer IV: Designing AI Cognition with MCP. A multi-agent system is a distributed system — once that frame is accepted, most of the design choices in this chapter become familiar, and most of the expensive failures of 2024 and 2025 stop being mysterious.

Why this chapter exists

The marketing language around agentic systems suggests that more agents are inherently better — more cognitive horsepower, more specialization, more emergent capability. The engineering reality is the opposite of this in most cases. Every additional agent adds a round trip, a serialization point, a place where one agent's output becomes another's input, and a new opportunity for the conversation to drift off track. The right starting question is not "how do I distribute this across agents?" but "can a single model with the right tools do this in one call?"

This chapter walks the two simplest orchestration shapes — sequential and concurrent — and the prior question that should precede either. Many of the most expensive production failures of the last two years were not failures of orchestration. They were systems built as multi-agent when a single well-tooled agent would have done the job with a tenth of the latency and none of the coordination bugs.

One line: sequential is a relay race, concurrent is a kitchen with many cooks — both buy capability at coordination cost, and neither is the right answer until a single well-tooled agent has been demonstrably ruled out.

6.1 When multiple agents actually help

The case for a single agent with tools is strongest when the task decomposes into a small number of well-defined operations against well-defined data sources. A code-review assistant that reads a diff, runs a linter, looks up conventions, and writes a comment can be built as one model invocation with four tools. Adding a second agent introduces latency without adding capability — the model is already doing the planning; a second model planning differently is a coordination cost, not an improvement.

Three properties make multiple agents pay off. Heterogeneity of context — when two phases need dramatically different system prompts, tools, or reference material, forcing both into one window dilutes the model's attention. Research-then-write is the canonical case: retrieval wants breadth and search tools, writing wants prose and no tools. Iterative refinement against an external check — if output needs to be reviewed and possibly rewritten, the maker and the checker each want their own context and prompt. Parallelism across independent subtasks — five document sources to summarize, three perspectives to gather, ten files to analyze — running them serially wastes wall-clock time on work that has no causal dependency.

Before committing to multi-agent, an engineer should be able to name the property that motivates it. A 2025 retrospective from a large logistics company replaced a seven-agent customer-support orchestration with a single Claude agent plus six MCP tools; the single-agent version was faster, cheaper, and scored higher on resolution quality. "Can we collapse this?" should be a standing question in any orchestration review.

6.2 Sequential orchestration: pipelines and progressive refinement

Sequential orchestration is the simplest multi-agent shape. The output of one agent becomes the input of the next. Most production "multi-agent" systems are sequential pipelines in disguise. The strength is legibility: the pipeline can be drawn on a whiteboard, tested stage by stage, and reasoned about as a series of input-output contracts. The contract between stages is the key artifact — each stage declares its input schema, the orchestrator enforces it with code rather than trust, and validation failures trigger retries or fallbacks rather than propagating silently.

The canonical case is research-then-write. A research agent with web search and retrieval tools produces a structured brief; a writing agent with no tools and a prose prompt turns the brief into an article. The writing agent does not see the false starts, the discarded sources, or the long reasoning chains. It sees the brief. Both stages can use different models — strong reasoning for research, strong prose for writing — and the costs accrue only where each is needed. Progressive refinement is a close cousin: draft, edit, fact-check, reformat. Specialized operators outperform a generalist trying to do everything in one pass.

The honest costs are three. Latency — an N-stage pipeline has a floor equal to the sum of stage runtimes. Long pipelines rule themselves out of conversational latency by definition. Error amplification — a four-stage pipeline at 95% per stage is 81% end-to-end; an eight-stage one is 66%. Per-stage validation with bounded retries is what keeps the math operable. Information loss between stages — each output is necessarily narrower than its working context, and information the writing agent later realizes it needs is gone unless the brief schema was richer than strictly required.

6.3 Concurrent orchestration: scatter, gather, multi-perspective

Concurrent orchestration runs multiple agents in parallel and combines their outputs. The defining property is no causal dependency during the work — dependency only at the combining step. Sometimes called scatter-gather, sometimes map-reduce for agents; the topology is the same.

Three use cases motivate the pattern. Parallelism across independent subtasks — five sources read in parallel, then one synthesizer. Wall-clock time is the slowest reader plus the synthesizer, not the sum. Multi-perspective analysis — the same input given to a financial-analyst prompt, a legal-reviewer prompt, and a product-strategist prompt, with framings genuinely different enough that the outputs are not just cosmetic variants. Ensembling for reliability — the same prompt across several agents with outputs voted or averaged, defensible when wrong answers cost much more than a 3x token bill.

The combining step is where engineering effort pays off. Naive synthesizers given long, contradictory inputs become bottlenecks. Three patterns improve it: structured intermediate outputs so the synthesizer merges fields deterministically rather than re-reading prose; hierarchical reduction so each combining agent sees a bounded number of inputs as the fan-out grows; and conflict surfacing so the synthesizer labels disagreements rather than silently picking a side.

The diagnostic question for whether scatter-gather is right: if I told one parallel agent what another is currently producing, would it change its output? If yes, the work was not independent and the pattern is wrong — you need either sequential dependencies or the dynamic patterns of Chapter 7.

6.4 The honest math of coordination

Every orchestration pattern is, at runtime, a distributed workflow over unreliable workers. Per-call failure rates of 1–5% are typical even on quality models — JSON parse failures, contract violations, hallucinated tool names, silent skips. Multiplied through a pipeline, a 2% per-stage rate at eight stages is 85% end-to-end. The mitigations are structural: per-stage validation that triggers bounded retries; per-stage observability that records inputs, outputs, latency, token spend, and which validation gates passed; and bounded fallback so an exhausted retry budget degrades gracefully rather than collapsing the whole flow.

Latency budgets need a ceiling, not just a floor — users do not care about the average, they care about the long tail. Cost budgets need a model up front: a two-stage pipeline costs ~1.5x a one-stage equivalent, scatter-gather across five branches costs 5–8x, roundtables 10x or more. Some systems are not viable at scale because their per-interaction cost exceeds the value of the interaction. Do the arithmetic at design time, not after the bill arrives.

Worth holding onto: the orchestration pattern should match the structure of the work, not the team's enthusiasm for agentic frameworks. Sequential is a relay race; concurrent is a kitchen. Both are distributed systems with unreliable workers, and the difference between a multi-agent system that works in demos and one that works in production is honest accounting of error rates, latency tails, and cost ratios.

What Chapter 6 sets up

Sequential and concurrent are the building blocks. They handle most multi-agent use cases when the task topology is known in advance and the workers' roles are fixed. They share an assumption: someone at design time decided what the agents are and how they connect. The orchestration is static; the pipeline is drawn before any user request arrives. Chapter 7 takes that assumption away.

Next — Chapter 7: Advanced Collaborative and Dynamic Patterns. Roundtables, handoff routing, and magentic orchestration — what happens when the topology has to be built per request rather than per design, and the failure modes (non-termination, mis-routing, runaway planning) the simpler patterns avoid.

Want the full picture? The book walks two field patterns in depth — the legal-tech contract-review pipeline that collapsed from five stages to two, and the consulting-firm scatter-gather research system that needed a triage step to avoid catastrophic decomposition failures — plus the full error-budget arithmetic for production multi-agent systems. View LLM Primer IV on Amazon →