Who Holds the Plan

The test

Not "which is stronger" —
but who holds the plan.

“With subagents and skills, Claude is the orchestrator: it decides turn by turn what to spawn next, and every result lands in Claude's context. A workflow script holds the loop, the branching, and the intermediate results itself, so Claude's context holds only the final answer.”

— Claude Code Docs · Orchestrate subagents at scale [Official]

Here's the way to remember it: what a Skill saves is the instructions; what a Workflow saves is the orchestration process itself. They aren't even the same kind of thing, so a Workflow doesn't replace a Skill — they each do their own job.

Try it

Switch and watch: who gets the plan,
where the results pile up

Switch between the three and keep an eye on two things: which box the block marked PLAN slides into, and where each step's intermediate results land on the right.

◍

Claude's Context

orchestrates turn by turn

{ }

Script · code

the script holds it

◆ PLAN

Where results land

CLAUDE'S CONTEXT WINDOW

—

Who decides next
Results live in
Scale
If interrupted

The official table

The plan changes hands, column by column

Don't read this table across the rows, comparing strength. Read it down the columns: watch how "who holds the plan" gets handed from the leftmost column all the way to the right. The highlighted row is the dividing line.

	Subagents	Skills	Workflows
What it is	a worker Claude spawns	instructions Claude follows	a script the runtime executes
Who decides what runs next	Claude, turn by turn	Claude, following the prompt	the script
Where intermediate results live	Claude's context	Claude's context	script variables
What you can reproduce	the worker definition	the instructions	the orchestration itself
Scale	a few delegations per turn	same as subagent	tens to hundreds of agents per run
If interrupted	restart the turn	restart the turn	resume within the same session

Every row comes straight from code.claude.com/docs/en/workflows [Official].

Three real samples

Three files, here's what each looks like

Those distinctions sound abstract, but on disk they're just three different files. Put them side by side and the table's "what you can reproduce" row gets concrete fast.

Subagent

An isolated worker

.claude/agents/security-scout.md

What you save and reuse is a worker definition: which tools it carries, what it reads, which model it runs on. When it actually gets called in is still Claude's call, in the moment.

---
name: security-scout
description: scan a single file for injection risk, in isolation
tools: Read, Grep
model: sonnet
---
You are a security reviewer. Read only the given file and find
unvalidated input that flows into dangerous sinks. Return each as
{ file, line, risk }. Don't fix anything, don't wander off-topic.

Skill

A reusable instruction

skills/pdf-fill/SKILL.md

What you save and reuse is an instruction: a set of steps, plus references that open up only when needed. Claude follows it, but how exactly it walks the path is still its own judgment.

---
name: pdf-fill
description: fill PDF form fields. Use when data needs
  to be written into a PDF form.
---
# Fill a PDF form

1. List every field with pdftk dump_data_fields
2. Map the user's data to field names
3. Build an FDF, then write it back with pdftk fill_form

→ field-name reference: references/field-map.md

Workflow

The orchestration itself, as code

workflows/bug-hunt.js

What you save and reuse is the orchestration itself: the loops, the branches, who verifies whom — all pinned down in the script. Next run follows the same flow, not Claude remembering it on the fly.

export const meta = {
  name: 'bug-hunt',
  description: 'find bugs across the repo, verify each before reporting',
  phases: [{ title: 'Find' }, { title: 'Verify' }],
}

// find in parallel by dimension; verify each finding the moment it appears
const found = await pipeline(DIMENSIONS,
  d => agent(d.prompt, { phase: 'Find', schema: BUGS }),
  rv => parallel(rv.bugs.map(b => () =>
    agent(`Try to refute this: ${b.title}`,
          { phase: 'Verify', schema: VERDICT })
      .then(v => ({ ...b, ok: v.isReal })))))

return found.flat().filter(b => b.ok)   // keep only the ones verified true

Samples are illustrative: Subagent / Skill follow the official file formats; the Workflow script follows the official runtime API (agent() / pipeline() / parallel() / meta).

Why it matters

Keep results out of the context,
and the context won't rot

This is exactly why a Workflow can run for hours, even days, on end. With a Skill or Subagent, every intermediate result gets stuffed back into the context window: the bigger the task, the fuller the window, and the more it rots. A Workflow keeps those results in script variables, so Claude only ever sees the final answer.

“Because the coordination happens outside the conversation, the plan stays on track no matter how big the task gets.”

— Claude Code Docs [Official]

How it runs

How a workflow reaches
results a single pass can't

The real trick isn't "running more agents." It's writing the quality patterns straight into the loop: let several conclusions poke holes in each other, draft a few versions from different angles and weigh them, and keep iterating until the answers settle.

1

Plan on the fly

You describe the task, and Claude writes a JS orchestration script on the spot (that's what dynamic means — generated for your task, not pulled from a template), then breaks it into subtasks.

2

Fan the work out

The script uses parallel() / pipeline() to spread the work across tens or hundreds of subagents at once. The nice thing about pipeline: each item runs its own course, nothing blocks anything else — whoever finishes first moves on, no waiting for the slowest one.

3

Check each other

Before a result folds back in, it goes through a check: a few other agents each try to poke holes and refute it, and whatever the majority rejects gets thrown out. A schema also pins down the return format, so there's no parsing raw text afterward.

∞

Iterate until it settles

The official line is “the run keeps iterating until the answers converge”: it ends when the answers settle, not after a fixed number of rounds. Finally everything is gathered into one coherent answer and handed back to you.

16

max agents running at once; fewer if you have fewer CPU cores

1,000

total agents per run, a backstop against runaway loops

0

user inputs you can inject mid-run — only a permission prompt pauses it

1

resume works only within the same session; quit and the next run starts over

Not a replacement — a combination

Each does its own job, and they stack

The two aren't even on the same level. Every agent a workflow fans out can have a skill attached before it starts work. They're meant to be used together, not picked between.

Skill

Changes what the model knows and does

Feeding the model knowledge and instructions a bit at a time, on demand — basically turning "how to write the prompt" into a finished product. The docs put it plainly: a skill's result isn't guaranteed to be the same every time, because the instructions are left for Claude to interpret; how it actually goes is still decided live, turn by turn.

Workflow

Changes how you orchestrate agents at scale, reliably

The orchestration logic moves out of Claude's context and becomes for loops, if branches, and pipeline() calls in a script. Once the script is written, "who holds the plan" passes from Claude's hands to that fixed code.

// inside a workflow script, run an agent with a skill
await agent(prompt, {
agentType: 'code-reviewer', // reuse a skill
schema: FINDINGS
})

Which to use

When to reach for which

Subagent

Isolate a piece of grunt work

You want a clean context to run exploration or search
Keep the noise out of the main context
Send a few out per turn, get results back, you keep deciding

Skill

Lock down a way of doing things

You have a reusable method, convention, or body of knowledge
You want Claude to follow the same approach every time
You can accept results varying a little with the model's judgment

Workflow

Orchestration too big for one context

A whole-repo bug hunt or security audit
A migration or modernization touching thousands of files
A key decision you want several independent agents to double-check

Flagship case

Bun: Zig → Rust, 11 days

This is the scale case the docs single out. Four workflows chained together: the first maps out every memory lifetime in the Zig code, the second ports it into Rust file by file — two reviewers per file — the third runs builds and tests on a loop until both go green, and the last runs optimizations overnight, opening a separate PR for each spot for a human to review.

~750K

lines of Rust

11

days, from first commit to merge

99.8%

of the existing test suite passing

4

workflows chained end to end

⚠ A few conditions you can't skip

Don't read this as "any migration can be done in 11 days." The conditions behind the result are demanding (drawing on the official caveats plus the first-hand rust-rewrite-plan.md in the Bun repo [Third-party]):

Not in production yet: the official wording is literally "While not yet in production," and 99.8% still isn't 100%.
It's a "strangler-fig" incremental migration, not a rewrite from scratch: Zig and Rust stay linked into the same binary the whole time, switched over one class at a time behind a flag.
Every switch has to clear several gates: tests, a shadow-diff of old vs. new output, and a performance budget that can't slip more than 2%. The result is forced out by these checks, not "write it and trust it."
Test coverage was already extremely high, and it's a one-person project led by the tool's own author.

What's worth keeping: as long as the code matters enough, the tests are solid, and the gates are strict, writing the flow as fixed scripts and letting it compile-and-test on a loop until green really can compress work once estimated in quarters down to days. So the right move is to pilot on a module like that first, not to copy the "11 days" figure onto every project.

Not "which is stronger" —but who holds the plan.

Switch and watch: who gets the plan,where the results pile up

Who holds the plan

Where results land

The plan changes hands, column by column

Three files, here's what each looks like

An isolated worker

A reusable instruction

The orchestration itself, as code

Keep results out of the context,and the context won't rot

How a workflow reachesresults a single pass can't

Plan on the fly

Fan the work out

Check each other

Iterate until it settles

Each does its own job, and they stack

Changes what the model knows and does

Changes how you orchestrate agents at scale, reliably

When to reach for which

Isolate a piece of grunt work

Lock down a way of doing things

Orchestration too big for one context

Bun: Zig → Rust, 11 days

Not "which is stronger" —
but who holds the plan.

Switch and watch: who gets the plan,
where the results pile up

Keep results out of the context,
and the context won't rot

How a workflow reaches
results a single pass can't