Who holds
the plan?
Subagent, Skill, and Workflow get lumped together. But when the official docs line all three up in one table, only one question tells them apart: who actually holds the plan that orchestrates the task — is it Claude deciding on the fly, turn by turn, or a piece of code you write once and re-run?
Not "which is stronger" —
but who holds the plan.
“With subagents and skills, Claude is the orchestrator: it decides turn by turn what to spawn next, and every result lands in Claude's context. A workflow script holds the loop, the branching, and the intermediate results itself, so Claude's context holds only the final answer.”
Here's the way to remember it: what a Skill saves is the instructions; what a Workflow saves is the orchestration process itself. They aren't even the same kind of thing, so a Workflow doesn't replace a Skill — they each do their own job.
Switch and watch: who gets the plan,
where the results pile up
Switch between the three and keep an eye on two things: which box the block marked PLAN slides into, and where each step's intermediate results land on the right.
Who holds the plan
Where results land
- Who decides next
- Results live in
- Scale
- If interrupted
The plan changes hands, column by column
Don't read this table across the rows, comparing strength. Read it down the columns: watch how "who holds the plan" gets handed from the leftmost column all the way to the right. The highlighted row is the dividing line.
| Subagents | Skills | Workflows | |
|---|---|---|---|
| What it is | a worker Claude spawns | instructions Claude follows | a script the runtime executes |
| Who decides what runs next | Claude, turn by turn | Claude, following the prompt | the script |
| Where intermediate results live | Claude's context | Claude's context | script variables |
| What you can reproduce | the worker definition | the instructions | the orchestration itself |
| Scale | a few delegations per turn | same as subagent | tens to hundreds of agents per run |
| If interrupted | restart the turn | restart the turn | resume within the same session |
Every row comes straight from code.claude.com/docs/en/workflows [Official].
Three files, here's what each looks like
Those distinctions sound abstract, but on disk they're just three different files. Put them side by side and the table's "what you can reproduce" row gets concrete fast.
--- name: security-scout description: scan a single file for injection risk, in isolation tools: Read, Grep model: sonnet --- You are a security reviewer. Read only the given file and find unvalidated input that flows into dangerous sinks. Return each as { file, line, risk }. Don't fix anything, don't wander off-topic.
--- name: pdf-fill description: fill PDF form fields. Use when data needs to be written into a PDF form. --- # Fill a PDF form 1. List every field with pdftk dump_data_fields 2. Map the user's data to field names 3. Build an FDF, then write it back with pdftk fill_form → field-name reference: references/field-map.md
export const meta = { name: 'bug-hunt', description: 'find bugs across the repo, verify each before reporting', phases: [{ title: 'Find' }, { title: 'Verify' }], } // find in parallel by dimension; verify each finding the moment it appears const found = await pipeline(DIMENSIONS, d => agent(d.prompt, { phase: 'Find', schema: BUGS }), rv => parallel(rv.bugs.map(b => () => agent(`Try to refute this: ${b.title}`, { phase: 'Verify', schema: VERDICT }) .then(v => ({ ...b, ok: v.isReal }))))) return found.flat().filter(b => b.ok) // keep only the ones verified true
Samples are illustrative: Subagent / Skill follow the official file formats; the Workflow script follows the official runtime API (agent() / pipeline() / parallel() / meta).
Keep results out of the context,
and the context won't rot
This is exactly why a Workflow can run for hours, even days, on end. With a Skill or Subagent, every intermediate result gets stuffed back into the context window: the bigger the task, the fuller the window, and the more it rots. A Workflow keeps those results in script variables, so Claude only ever sees the final answer.
“Because the coordination happens outside the conversation, the plan stays on track no matter how big the task gets.”
How a workflow reaches
results a single pass can't
The real trick isn't "running more agents." It's writing the quality patterns straight into the loop: let several conclusions poke holes in each other, draft a few versions from different angles and weigh them, and keep iterating until the answers settle.
Plan on the fly
You describe the task, and Claude writes a JS orchestration script on the spot (that's what dynamic means — generated for your task, not pulled from a template), then breaks it into subtasks.
Fan the work out
The script uses parallel() / pipeline() to spread the work across tens or hundreds of subagents at once. The nice thing about pipeline: each item runs its own course, nothing blocks anything else — whoever finishes first moves on, no waiting for the slowest one.
Check each other
Before a result folds back in, it goes through a check: a few other agents each try to poke holes and refute it, and whatever the majority rejects gets thrown out. A schema also pins down the return format, so there's no parsing raw text afterward.
Iterate until it settles
The official line is “the run keeps iterating until the answers converge”: it ends when the answers settle, not after a fixed number of rounds. Finally everything is gathered into one coherent answer and handed back to you.
Each does its own job, and they stack
The two aren't even on the same level. Every agent a workflow fans out can have a skill attached before it starts work. They're meant to be used together, not picked between.
Changes what the model knows and does
Feeding the model knowledge and instructions a bit at a time, on demand — basically turning "how to write the prompt" into a finished product. The docs put it plainly: a skill's result isn't guaranteed to be the same every time, because the instructions are left for Claude to interpret; how it actually goes is still decided live, turn by turn.
Changes how you orchestrate agents at scale, reliably
The orchestration logic moves out of Claude's context and becomes for loops, if branches, and pipeline() calls in a script. Once the script is written, "who holds the plan" passes from Claude's hands to that fixed code.
await agent(prompt, {
agentType: 'code-reviewer', // reuse a skill
schema: FINDINGS
})
When to reach for which
Isolate a piece of grunt work
- You want a clean context to run exploration or search
- Keep the noise out of the main context
- Send a few out per turn, get results back, you keep deciding
Lock down a way of doing things
- You have a reusable method, convention, or body of knowledge
- You want Claude to follow the same approach every time
- You can accept results varying a little with the model's judgment
Orchestration too big for one context
- A whole-repo bug hunt or security audit
- A migration or modernization touching thousands of files
- A key decision you want several independent agents to double-check
Bun: Zig → Rust, 11 days
This is the scale case the docs single out. Four workflows chained together: the first maps out every memory lifetime in the Zig code, the second ports it into Rust file by file — two reviewers per file — the third runs builds and tests on a loop until both go green, and the last runs optimizations overnight, opening a separate PR for each spot for a human to review.
Don't read this as "any migration can be done in 11 days." The conditions behind the result are demanding (drawing on the official caveats plus the first-hand rust-rewrite-plan.md in the Bun repo [Third-party]):
- Not in production yet: the official wording is literally "While not yet in production," and 99.8% still isn't 100%.
- It's a "strangler-fig" incremental migration, not a rewrite from scratch: Zig and Rust stay linked into the same binary the whole time, switched over one class at a time behind a flag.
- Every switch has to clear several gates: tests, a shadow-diff of old vs. new output, and a performance budget that can't slip more than 2%. The result is forced out by these checks, not "write it and trust it."
- Test coverage was already extremely high, and it's a one-person project led by the tool's own author.
What's worth keeping: as long as the code matters enough, the tests are solid, and the gates are strict, writing the flow as fixed scripts and letting it compile-and-test on a loop until green really can compress work once estimated in quarters down to days. So the right move is to pilot on a module like that first, not to copy the "11 days" figure onto every project.