A harness
for every task
The first piece covered what Dynamic Workflows are. This one covers how to use them. Hand a long task to a single Claude running end-to-end and it goes lazy, prefers its own work, and drifts off goal. Split the work into subagents with their own context windows, then orchestrate them — fan out, verify adversarially, run them off against each other. That's what workflows are actually for.
The first asked "what,"
this one asks "how."
A week after launch, Thariq Shihipar and Sid Bidasaria of the Claude Code team published a follow-up. It doesn't re-explain what a workflow is — it covers which patterns keep paying off, why long tasks need this kind of decomposition, which use cases are a surprisingly good fit, and when not to bother at all.
Haven't read the "what" piece? The nature of a workflow (a JS orchestration script Claude writes on the fly), its dividing line from a Skill (who holds the plan), the runtime mechanics and hard limits, and the flagship Bun Zig→Rust port — all live in "Who Holds the Plan." This piece assumes you already know those and won't repeat them.
The authors lead with the caveat up front: workflows use meaningfully more tokens and are best suited to complex, high-value tasks, and best practices are still developing. It's not a universal hammer — a point the whole piece keeps coming back to.
“Keep in mind, best practices are still developing: dynamic workflows often use more tokens and are best suited for complex, high value tasks.”
Three ways a single context
breaks on a long task
To understand the six patterns, you first have to see what they treat. The default harness plans and executes in the same context window — great for many coding tasks, but it breaks down over long-running, massively parallel, highly structured, or adversarial work. It breaks in three named ways.
Agentic laziness
Declares the job done before it's finished. On a 50-item security review, it stops at item 35 and calls it complete — the last 15 silently skipped.
Self-preferential bias
Prefers its own results. Especially when asked to verify or judge them against a rubric, it instinctively protects its own output — player and referee at once.
Goal drift
Loses fidelity to the original objective over many turns. Worst after compaction — "don't do X" constraints get lost in each lossy summarization step.
The fix is one sentence: “Creating a workflow helps combat these by orchestrating separate Claude subagents with their own context windows and focused, isolated goals.”
Laziness — because the script holds the full checklist; the loop isn't done until the end, not when the model feels finished. Bias — because the verifier is a separate, independent agent, not the author. Drift — because the goal lives in the script, untouched by compaction. The root of all three patterns is right here.
Six patterns Claude reaches for again and again
These six compose — Claude tends to combine them when it builds a workflow. Click through to see the orchestration shape of each.
Half the use cases aren't coding at all
The authors push you to think creatively about where workflows fit — they've found them sometimes even more useful for non-technical work. The list below mixes the technical and the not.
Migrations & refactors tech
Break the task into a series of things to operate on — callsites, failing tests, modules. Spin off an agent per fix in its own worktree, have another adversarially review, then merge. Bun's Zig→Rust rewrite was done this way (case nuance in the prior piece).
Deep research tech
The built-in /deep-research runs on a workflow: fan out searches, fetch sources, adversarially verify each claim, synthesize a cited report. Not just web search — it also compiles a status report from Slack context, or explores a codebase to learn how a feature works.
Deep verification tech
To check and source every factual claim in a report: one agent pulls out all the claims, then a subagent checks each in detail. You can even add an agent to verify that the checker's source is high quality.
Sorting tech
1,000+ rows in one prompt degrades and won't fit in context. Use a tournament, a pipeline of pairwise-comparison agents, or bucket-rank in parallel then merge. The key lesson: comparative judgment is more reliable than absolute scoring.
Root-cause investigation cross-fn
Have agents generate hypotheses from disjoint evidence — one on logs, one on files, one on data — then face a panel of verifiers and refuters. Not just for code: sales (why did sales drop in March?), data engineering (why did the pipeline fail?), any post-mortem.
Memory & rule adherence cross-fn
For rules Claude keeps missing even in CLAUDE.md: one verifier agent per rule. The reverse is sharper — mine recent sessions for corrections you keep making, adversarially verify each, distill survivors back into CLAUDE.md.
Exploration & taste cross-fn
For taste-based work like design or naming that benefits from a rubric: have Claude explore many options, give a review agent a rubric for what good looks like, and it's done when the reviewer is satisfied — or pick via tournament.
Model routing tech
A classifier agent decides which model to use. "Explain how the auth module works" depends on how many files it has and the shape of the codebase — the classifier researches first, then routes to Sonnet or Opus.
Triaging at scale Quarantine
For a backlog humans can't fully process: a triage workflow classifies each item, dedupes against what's already tracked, and acts (attempt a fix or escalate). The safety pattern here is quarantine — agents that read untrusted public content are barred from high-privilege actions, which are done instead by the agents in charge of acting on the information. Separating reading from doing is the structural defense against prompt injection. Pair it with /loop to triage continuously. (Lightweight evals follow the same shape: run agents in their own worktrees, then spawn comparison agents to grade against a rubric.)
Not everything needs
a panel of 5 reviewers
The authors devote a section to not using workflows, and they're clear about it: workflows are new, they create outsized results for many use cases, but they aren't needed for every task and may use significantly more tokens.
Does it really need more compute? Workflows are for pushing Claude Code into places you couldn't reach before — not for piling onto every small task. As the authors put it —
“most traditional coding tasks do not need a panel of 5 reviewers.”
Beyond big tasks, a few more ways in
Not just for large tasks
You can prompt a "quick workflow" — for instance, a fast adversarial review of an assumption. A one-line ask.
Pair with cadence and a hard target
For repeatable workflows (triage, research, verification), pair with /loop to run at regular intervals and /goal to set a hard completion requirement.
Cap it
Prompt it with a budget like "use 10k tokens" to set a ceiling on how many tokens the task uses.
Keep and distribute
Press s in the menu to save; check into ~/.claude/workflows or ship via a skill. Prompt Claude to treat it as a template, not a script to run verbatim.
The logic closes
The prior piece answers "what this is, and why the plan should move into code"; this one answers "how to build it, how to use it, when not to." Read together, the chain closes:
→ so the plan moves out of the model's context into a deterministic script (prior thesis)
→ and once it's there, adversarial verify / tournament / loop-until-dry / quarantine go from "hope the model remembers" to "the code enforces it" (this piece)
That's exactly what separates a workflow from "just run more agents in parallel" — it writes how quality is guaranteed into the code too.