Multi-agent patterns
Advanced AI Workflows & the Claude API
Chapter 7 · Multi-Agent Patterns
A single Claude call is powerful, but it has limits. The context window is finite, long responses can be inconsistent, and some tasks are genuinely too complex to solve in one pass. Multi-agent patterns solve this by splitting work across multiple Claude calls — each call focused on a specific subtask, the results combined by an orchestrator. This chapter covers the core patterns, how to implement them, and when they're worth the added complexity.
Why Use Multiple Agents?
- Tasks too large for one context window — analysing a large codebase, processing hundreds of documents, generating a long structured report section by section.
- Parallel work — subtasks that don't depend on each other can run simultaneously, cutting total wall-clock time.
- Specialisation — different agents can have different system prompts, making one a careful fact-checker, another a creative writer, another a data extractor.
- Verification — one agent generates an answer; another independently checks it. Disagreements surface errors that a single pass would miss.
Core Patterns
1 — Orchestrator + Workers
Orchestrator / Worker
Input task
→
Orchestrator
decomposes task
decomposes task
→
Worker A
subtask 1
subtask 1
Worker B
subtask 2
subtask 2
Worker C
subtask 3
subtask 3
→
Orchestrator
combines results
combines results
→
Final output
The orchestrator receives the full task, breaks it into subtasks, dispatches them to worker agents (in parallel or sequence), and synthesises the results. Workers are stateless — they don't know about each other or the broader task.
2 — Sequential Pipeline
Sequential Pipeline (each agent sees the previous agent's output)
Raw input
→
Agent 1
extract
extract
→
Agent 2
analyse
analyse
→
Agent 3
format
format
→
Output
Each agent takes the previous agent's output as input. Good for multi-stage transformations where each step builds on the last — e.g. extract → analyse → summarise → format.
3 — Generator + Critic
Generator / Critic (self-verification)
Generator
produces answer
produces answer
→
Critic
checks answer
checks answer
→
Accept / Revise?
If revise → loop back to Generator with critic feedback
Two separate Claude instances with different system prompts. The generator writes freely; the critic is instructed to be sceptical and look for errors. Disagreement triggers a revision loop. Effective for factual accuracy, code correctness, or argument quality.
4 — Map / Reduce
Map / Reduce (parallel processing of many items)
100 documents
→
Split
→
Extract(doc 1)
Extract(doc 2)
… (parallel)
→
Reduce
merge / aggregate
merge / aggregate
→
Summary report
Each item is processed independently in parallel (map), then results are merged by a final agent (reduce). Ideal for tasks like summarising many documents, classifying a batch of items, or extracting structured data from a large file set.
Implementing an Orchestrator–Worker Pipeline
multi_agent.py — orchestrator dispatches parallel workerspython
import anthropic
import asyncio
client = anthropic.Anthropic()
def call_claude(system: str, prompt: str, model="claude-haiku-4-5-20251001") -> str:
"""Single synchronous Claude call — used by workers."""
response = client.messages.create(
model=model,
max_tokens=512,
system=system,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
async def worker(loop, section: str) -> str:
"""Run a synchronous Claude call in a thread pool."""
return await loop.run_in_executor(
None, call_claude,
"You are a precise summariser. Return 3 bullet points only.",
f"Summarise this section:\n\n{section}"
)
async def orchestrate(document: str) -> str:
# Step 1 — Orchestrator splits the document into sections
split_result = call_claude(
"Split the text into logical sections. Return each section separated by '---'.",
document,
model="claude-sonnet-4-6"
)
sections = [s.strip() for s in split_result.split("---") if s.strip()]
# Step 2 — Workers summarise all sections in parallel
loop = asyncio.get_event_loop()
summaries = await asyncio.gather(
*[worker(loop, s) for s in sections]
)
# Step 3 — Orchestrator merges all summaries into a final report
combined = "\n\n".join(summaries)
return call_claude(
"You are a report writer. Combine these section summaries into a cohesive executive summary.",
combined,
model="claude-sonnet-4-6"
)
if __name__ == "__main__":
with open("report.txt") as f:
doc = f.read()
result = asyncio.run(orchestrate(doc))
print(result)
import asyncio
client = anthropic.Anthropic()
def call_claude(system: str, prompt: str, model="claude-haiku-4-5-20251001") -> str:
"""Single synchronous Claude call — used by workers."""
response = client.messages.create(
model=model,
max_tokens=512,
system=system,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
async def worker(loop, section: str) -> str:
"""Run a synchronous Claude call in a thread pool."""
return await loop.run_in_executor(
None, call_claude,
"You are a precise summariser. Return 3 bullet points only.",
f"Summarise this section:\n\n{section}"
)
async def orchestrate(document: str) -> str:
# Step 1 — Orchestrator splits the document into sections
split_result = call_claude(
"Split the text into logical sections. Return each section separated by '---'.",
document,
model="claude-sonnet-4-6"
)
sections = [s.strip() for s in split_result.split("---") if s.strip()]
# Step 2 — Workers summarise all sections in parallel
loop = asyncio.get_event_loop()
summaries = await asyncio.gather(
*[worker(loop, s) for s in sections]
)
# Step 3 — Orchestrator merges all summaries into a final report
combined = "\n\n".join(summaries)
return call_claude(
"You are a report writer. Combine these section summaries into a cohesive executive summary.",
combined,
model="claude-sonnet-4-6"
)
if __name__ == "__main__":
with open("report.txt") as f:
doc = f.read()
result = asyncio.run(orchestrate(doc))
print(result)
Use Haiku for workers, Sonnet/Opus for the orchestrator
Workers do focused, repetitive subtasks — extract, classify, summarise a single item. They don't need a powerful model. Haiku is fast and cheap here. Save Sonnet or Opus for the orchestrator that must reason about the overall task and synthesise results. This mix dramatically reduces cost without sacrificing output quality.
Generator + Critic — Self-Verification Loop
generator_critic.py — two agents check each other's workpython
GENERATOR_SYSTEM = "You are a helpful assistant. Answer the question clearly and thoroughly."
CRITIC_SYSTEM = """You are a strict fact-checker. \
Review the answer below and respond with either: \
PASS: <brief reason> \
FAIL: <specific issue> and <corrected answer>"""
def generate_and_verify(question: str, max_rounds: int = 3) -> str:
answer = call_claude(GENERATOR_SYSTEM, question)
for _ in range(max_rounds):
verdict = call_claude(
CRITIC_SYSTEM,
f"Question: {question}\n\nAnswer: {answer}"
)
if verdict.startswith("PASS"):
break # critic is satisfied
# FAIL — extract corrected answer and feed back to generator
answer = call_claude(
GENERATOR_SYSTEM,
f"Your answer was criticised: {verdict}\nRevise your answer to the question: {question}"
)
return answer
CRITIC_SYSTEM = """You are a strict fact-checker. \
Review the answer below and respond with either: \
PASS: <brief reason> \
FAIL: <specific issue> and <corrected answer>"""
def generate_and_verify(question: str, max_rounds: int = 3) -> str:
answer = call_claude(GENERATOR_SYSTEM, question)
for _ in range(max_rounds):
verdict = call_claude(
CRITIC_SYSTEM,
f"Question: {question}\n\nAnswer: {answer}"
)
if verdict.startswith("PASS"):
break # critic is satisfied
# FAIL — extract corrected answer and feed back to generator
answer = call_claude(
GENERATOR_SYSTEM,
f"Your answer was criticised: {verdict}\nRevise your answer to the question: {question}"
)
return answer
Choosing a Pattern
| Pattern | Best for | Main trade-off |
|---|---|---|
| Orchestrator + Workers | Tasks that decompose cleanly into parallel subtasks — research, batch processing, report generation | Orchestrator adds latency; decomposition quality determines output quality |
| Sequential Pipeline | Multi-stage transformations where each step builds on the last — extract → clean → analyse → format | Simple to implement; errors propagate downstream and compound |
| Generator + Critic | High-stakes answers, factual claims, code correctness — anything worth double-checking | 2–3× the API calls and cost; revision loops add latency |
| Map / Reduce | Large collections of similar items — classify 500 emails, extract data from 100 PDFs | Parallel calls hit rate limits; reduce step must handle inconsistent map outputs |
Tradeoffs vs a Single Call
When multi-agent adds value
- Task genuinely exceeds one context window
- Subtasks can run in parallel — total time matters
- Quality improves with a separate verification step
- You need specialised behaviour per subtask
- You're processing many items of the same type
When a single call is better
- Task fits comfortably in one context window
- Latency matters — multi-agent is always slower to start
- Costs are a concern — every agent call is billed separately
- Errors in decomposition or combination can make outputs worse than a single focused call
- The task requires deep reasoning across all parts simultaneously
Rate limits compound quickly
Running 20 workers in parallel means 20 simultaneous API calls. Anthropic's rate limits are per-organisation, not per-request — you can hit them quickly with parallel agents. Design your worker pool size to stay within your tier's requests-per-minute limit, and add retry logic with exponential backoff for 429 errors.
Keep agents stateless
Workers should receive everything they need in their prompt and return everything the orchestrator needs in their response. Don't have agents share mutable state — it creates ordering dependencies that break parallelism and make debugging much harder. Treat each agent call as a pure function: input in, output out.