Claude Code · 2026-06-02 · How-to

A harness
for every task

The first piece covered what Dynamic Workflows are. This one covers how to use them. Hand a long task to a single Claude running end-to-end and it goes lazy, prefers its own work, and drifts off goal. Split the work into subagents with their own context windows, then orchestrate them — fan out, verify adversarially, run them off against each other. That's what workflows are actually for.

1Classify-and-act

2Fan-out-synthesize

3Adversarial verify

4Generate-and-filter

5Tournament

6Loop until done

GenAI Playbook · Claude Code deep dive · six patterns × three failure modes

Continuing the last piece

The first asked "what,"
this one asks "how."

A week after launch, Thariq Shihipar and Sid Bidasaria of the Claude Code team published a follow-up. It doesn't re-explain what a workflow is — it covers which patterns keep paying off, why long tasks need this kind of decomposition, which use cases are a surprisingly good fit, and when not to bother at all.

↗

Haven't read the "what" piece? The nature of a workflow (a JS orchestration script Claude writes on the fly), its dividing line from a Skill (who holds the plan), the runtime mechanics and hard limits, and the flagship Bun Zig→Rust port — all live in "Who Holds the Plan." This piece assumes you already know those and won't repeat them.

The authors lead with the caveat up front: workflows use meaningfully more tokens and are best suited to complex, high-value tasks, and best practices are still developing. It's not a universal hammer — a point the whole piece keeps coming back to.

“Keep in mind, best practices are still developing: dynamic workflows often use more tokens and are best suited for complex, high value tasks.”

— A harness for every task · claude.com/blog [official]

Why decompose

Three ways a single context
breaks on a long task

To understand the six patterns, you first have to see what they treat. The default harness plans and executes in the same context window — great for many coding tasks, but it breaks down over long-running, massively parallel, highly structured, or adversarial work. It breaks in three named ways.

Failure 01

Agentic laziness

Declares the job done before it's finished. On a 50-item security review, it stops at item 35 and calls it complete — the last 15 silently skipped.

Failure 02

Self-preferential bias

Prefers its own results. Especially when asked to verify or judge them against a rubric, it instinctively protects its own output — player and referee at once.

Failure 03

Goal drift

Loses fidelity to the original objective over many turns. Worst after compaction — "don't do X" constraints get lost in each lossy summarization step.

The fix is one sentence: “Creating a workflow helps combat these by orchestrating separate Claude subagents with their own context windows and focused, isolated goals.”

— A harness for every task [official]

Laziness — because the script holds the full checklist; the loop isn't done until the end, not when the model feels finished. Bias — because the verifier is a separate, independent agent, not the author. Drift — because the goal lives in the script, untouched by compaction. The root of all three patterns is right here.

Core · interactive

Six patterns Claude reaches for again and again

These six compose — Claude tends to combine them when it builds a workflow. Click through to see the orchestration shape of each.

Use cases

Half the use cases aren't coding at all

The authors push you to think creatively about where workflows fit — they've found them sometimes even more useful for non-technical work. The list below mixes the technical and the not.

Technical Non-technical / cross-functional

Migrations & refactors tech

Break the task into a series of things to operate on — callsites, failing tests, modules. Spin off an agent per fix in its own worktree, have another adversarially review, then merge. Bun's Zig→Rust rewrite was done this way (case nuance in the prior piece).

Deep research tech

The built-in /deep-research runs on a workflow: fan out searches, fetch sources, adversarially verify each claim, synthesize a cited report. Not just web search — it also compiles a status report from Slack context, or explores a codebase to learn how a feature works.

Deep verification tech

To check and source every factual claim in a report: one agent pulls out all the claims, then a subagent checks each in detail. You can even add an agent to verify that the checker's source is high quality.

Sorting tech

1,000+ rows in one prompt degrades and won't fit in context. Use a tournament, a pipeline of pairwise-comparison agents, or bucket-rank in parallel then merge. The key lesson: comparative judgment is more reliable than absolute scoring.

Root-cause investigation cross-fn

Have agents generate hypotheses from disjoint evidence — one on logs, one on files, one on data — then face a panel of verifiers and refuters. Not just for code: sales (why did sales drop in March?), data engineering (why did the pipeline fail?), any post-mortem.

Memory & rule adherence cross-fn

For rules Claude keeps missing even in CLAUDE.md: one verifier agent per rule. The reverse is sharper — mine recent sessions for corrections you keep making, adversarially verify each, distill survivors back into CLAUDE.md.

Exploration & taste cross-fn

For taste-based work like design or naming that benefits from a rubric: have Claude explore many options, give a review agent a rubric for what good looks like, and it's done when the reviewer is satisfied — or pick via tournament.

Model routing tech

A classifier agent decides which model to use. "Explain how the auth module works" depends on how many files it has and the shape of the codebase — the classifier researches first, then routes to Sonnet or Opus.

Triaging at scale Quarantine

For a backlog humans can't fully process: a triage workflow classifies each item, dedupes against what's already tracked, and acts (attempt a fix or escalate). The safety pattern here is quarantine — agents that read untrusted public content are barred from high-privilege actions, which are done instead by the agents in charge of acting on the information. Separating reading from doing is the structural defense against prompt injection. Pair it with /loop to triage continuously. (Lightweight evals follow the same shape: run agents in their own worktrees, then spawn comparison agents to grade against a rubric.)

When not to use

Not everything needs
a panel of 5 reviewers

The authors devote a section to not using workflows, and they're clear about it: workflows are new, they create outsized results for many use cases, but they aren't needed for every task and may use significantly more tokens.

⚠ For regular coding tasks, ask yourself first

Does it really need more compute? Workflows are for pushing Claude Code into places you couldn't reach before — not for piling onto every small task. As the authors put it —

“most traditional coding tasks do not need a panel of 5 reviewers.”

Tips for using them well

Beyond big tasks, a few more ways in

◇ Quick workflow

Not just for large tasks

You can prompt a "quick workflow" — for instance, a fast adversarial review of an assumption. A one-line ask.

◇ /loop + /goal

Pair with cadence and a hard target

For repeatable workflows (triage, research, verification), pair with /loop to run at regular intervals and /goal to set a hard completion requirement.

◇ Token budget

Cap it

Prompt it with a budget like "use 10k tokens" to set a ceiling on how many tokens the task uses.

◇ Save & share

Keep and distribute

Press s in the menu to save; check into ~/.claude/workflows or ship via a skill. Prompt Claude to treat it as a template, not a script to run verbatim.

Reading both together

The logic closes

The prior piece answers "what this is, and why the plan should move into code"; this one answers "how to build it, how to use it, when not to." Read together, the chain closes:

A single context goes lazy, biased, and drifting on long tasks (this piece)
→ so the plan moves out of the model's context into a deterministic script (prior thesis)
→ and once it's there, adversarial verify / tournament / loop-until-dry / quarantine go from "hope the model remembers" to "the code enforces it" (this piece)

That's exactly what separates a workflow from "just run more agents in parallel" — it writes how quality is guaranteed into the code too.