Claude Code · 2026-05-28 · Research Preview

Who holds
the plan?

Subagent, Skill, and Workflow get lumped together. But when the official docs line all three up in one table, only one question tells them apart: who actually holds the plan that orchestrates the task — is it Claude deciding on the fly, turn by turn, or a piece of code you write once and re-run?

Subagent
Skill
Workflow
GenAI Playbook · Claude Code Deep Dives
The test

Not "which is stronger" —
but who holds the plan.

“With subagents and skills, Claude is the orchestrator: it decides turn by turn what to spawn next, and every result lands in Claude's context. A workflow script holds the loop, the branching, and the intermediate results itself, so Claude's context holds only the final answer.”

— Claude Code Docs · Orchestrate subagents at scale [Official]

Here's the way to remember it: what a Skill saves is the instructions; what a Workflow saves is the orchestration process itself. They aren't even the same kind of thing, so a Workflow doesn't replace a Skill — they each do their own job.

Try it

Switch and watch: who gets the plan,
where the results pile up

Switch between the three and keep an eye on two things: which box the block marked PLAN slides into, and where each step's intermediate results land on the right.

Who holds the plan

Claude's Context
orchestrates turn by turn
{ }
Script · code
the script holds it
◆ PLAN

Where results land

CLAUDE'S CONTEXT WINDOW
Who decides next
Results live in
Scale
If interrupted
The official table

The plan changes hands, column by column

Don't read this table across the rows, comparing strength. Read it down the columns: watch how "who holds the plan" gets handed from the leftmost column all the way to the right. The highlighted row is the dividing line.

SubagentsSkillsWorkflows
What it isa worker Claude spawnsinstructions Claude followsa script the runtime executes
Who decides what runs nextClaude, turn by turnClaude, following the promptthe script
Where intermediate results liveClaude's contextClaude's contextscript variables
What you can reproducethe worker definitionthe instructionsthe orchestration itself
Scalea few delegations per turnsame as subagenttens to hundreds of agents per run
If interruptedrestart the turnrestart the turnresume within the same session

Every row comes straight from code.claude.com/docs/en/workflows [Official].

Three real samples

Three files, here's what each looks like

Those distinctions sound abstract, but on disk they're just three different files. Put them side by side and the table's "what you can reproduce" row gets concrete fast.

Subagent

An isolated worker

.claude/agents/security-scout.md

What you save and reuse is a worker definition: which tools it carries, what it reads, which model it runs on. When it actually gets called in is still Claude's call, in the moment.

---
name: security-scout
description: scan a single file for injection risk, in isolation
tools: Read, Grep
model: sonnet
---
You are a security reviewer. Read only the given file and find
unvalidated input that flows into dangerous sinks. Return each as
{ file, line, risk }. Don't fix anything, don't wander off-topic.
Skill

A reusable instruction

skills/pdf-fill/SKILL.md

What you save and reuse is an instruction: a set of steps, plus references that open up only when needed. Claude follows it, but how exactly it walks the path is still its own judgment.

---
name: pdf-fill
description: fill PDF form fields. Use when data needs
  to be written into a PDF form.
---
# Fill a PDF form

1. List every field with pdftk dump_data_fields
2. Map the user's data to field names
3. Build an FDF, then write it back with pdftk fill_form

→ field-name reference: references/field-map.md
Workflow

The orchestration itself, as code

workflows/bug-hunt.js

What you save and reuse is the orchestration itself: the loops, the branches, who verifies whom — all pinned down in the script. Next run follows the same flow, not Claude remembering it on the fly.

export const meta = {
  name: 'bug-hunt',
  description: 'find bugs across the repo, verify each before reporting',
  phases: [{ title: 'Find' }, { title: 'Verify' }],
}

// find in parallel by dimension; verify each finding the moment it appears
const found = await pipeline(DIMENSIONS,
  d => agent(d.prompt, { phase: 'Find', schema: BUGS }),
  rv => parallel(rv.bugs.map(b => () =>
    agent(`Try to refute this: ${b.title}`,
          { phase: 'Verify', schema: VERDICT })
      .then(v => ({ ...b, ok: v.isReal })))))

return found.flat().filter(b => b.ok)   // keep only the ones verified true

Samples are illustrative: Subagent / Skill follow the official file formats; the Workflow script follows the official runtime API (agent() / pipeline() / parallel() / meta).

Why it matters

Keep results out of the context,
and the context won't rot

This is exactly why a Workflow can run for hours, even days, on end. With a Skill or Subagent, every intermediate result gets stuffed back into the context window: the bigger the task, the fuller the window, and the more it rots. A Workflow keeps those results in script variables, so Claude only ever sees the final answer.

“Because the coordination happens outside the conversation, the plan stays on track no matter how big the task gets.”

— Claude Code Docs [Official]
How it runs

How a workflow reaches
results a single pass can't

The real trick isn't "running more agents." It's writing the quality patterns straight into the loop: let several conclusions poke holes in each other, draft a few versions from different angles and weigh them, and keep iterating until the answers settle.

1
Plan on the fly

You describe the task, and Claude writes a JS orchestration script on the spot (that's what dynamic means — generated for your task, not pulled from a template), then breaks it into subtasks.

2
Fan the work out

The script uses parallel() / pipeline() to spread the work across tens or hundreds of subagents at once. The nice thing about pipeline: each item runs its own course, nothing blocks anything else — whoever finishes first moves on, no waiting for the slowest one.

3
Check each other

Before a result folds back in, it goes through a check: a few other agents each try to poke holes and refute it, and whatever the majority rejects gets thrown out. A schema also pins down the return format, so there's no parsing raw text afterward.

Iterate until it settles

The official line is “the run keeps iterating until the answers converge”: it ends when the answers settle, not after a fixed number of rounds. Finally everything is gathered into one coherent answer and handed back to you.

16
max agents running at once; fewer if you have fewer CPU cores
1,000
total agents per run, a backstop against runaway loops
0
user inputs you can inject mid-run — only a permission prompt pauses it
1
resume works only within the same session; quit and the next run starts over
Not a replacement — a combination

Each does its own job, and they stack

The two aren't even on the same level. Every agent a workflow fans out can have a skill attached before it starts work. They're meant to be used together, not picked between.

Skill

Changes what the model knows and does

Feeding the model knowledge and instructions a bit at a time, on demand — basically turning "how to write the prompt" into a finished product. The docs put it plainly: a skill's result isn't guaranteed to be the same every time, because the instructions are left for Claude to interpret; how it actually goes is still decided live, turn by turn.

Workflow

Changes how you orchestrate agents at scale, reliably

The orchestration logic moves out of Claude's context and becomes for loops, if branches, and pipeline() calls in a script. Once the script is written, "who holds the plan" passes from Claude's hands to that fixed code.

// inside a workflow script, run an agent with a skill
await agent(prompt, {
  agentType: 'code-reviewer', // reuse a skill
  schema: FINDINGS
})
Which to use

When to reach for which

Subagent

Isolate a piece of grunt work

  • You want a clean context to run exploration or search
  • Keep the noise out of the main context
  • Send a few out per turn, get results back, you keep deciding
Skill

Lock down a way of doing things

  • You have a reusable method, convention, or body of knowledge
  • You want Claude to follow the same approach every time
  • You can accept results varying a little with the model's judgment
Workflow

Orchestration too big for one context

  • A whole-repo bug hunt or security audit
  • A migration or modernization touching thousands of files
  • A key decision you want several independent agents to double-check
Flagship case

Bun: Zig → Rust, 11 days

This is the scale case the docs single out. Four workflows chained together: the first maps out every memory lifetime in the Zig code, the second ports it into Rust file by file — two reviewers per file — the third runs builds and tests on a loop until both go green, and the last runs optimizations overnight, opening a separate PR for each spot for a human to review.

~750K
lines of Rust
11
days, from first commit to merge
99.8%
of the existing test suite passing
4
workflows chained end to end
⚠ A few conditions you can't skip

Don't read this as "any migration can be done in 11 days." The conditions behind the result are demanding (drawing on the official caveats plus the first-hand rust-rewrite-plan.md in the Bun repo [Third-party]):

  • Not in production yet: the official wording is literally "While not yet in production," and 99.8% still isn't 100%.
  • It's a "strangler-fig" incremental migration, not a rewrite from scratch: Zig and Rust stay linked into the same binary the whole time, switched over one class at a time behind a flag.
  • Every switch has to clear several gates: tests, a shadow-diff of old vs. new output, and a performance budget that can't slip more than 2%. The result is forced out by these checks, not "write it and trust it."
  • Test coverage was already extremely high, and it's a one-person project led by the tool's own author.

What's worth keeping: as long as the code matters enough, the tests are solid, and the gates are strict, writing the flow as fixed scripts and letting it compile-and-test on a loop until green really can compress work once estimated in quarters down to days. So the right move is to pilot on a module like that first, not to copy the "11 days" figure onto every project.