← PLAYBOOK
Architecture Quarterly
Vol. 04 · Issue 13 May 2026 Field Note № 08 中文
Field Note № 08 · Verifiable Components

The DOM grows
a machine-readable
surfaceat last.

data-verify-doc="agent-native-verification" data-verify-rounds="3" data-verify-status="ALL_PASS"

Contract testing is not new. Fixtures, invariants, schemas — the pattern has lived in textbooks for two decades, and only a handful of backend teams ever made it routine. What is new is the cost structure: with an agent willing to write, run and rewrite the verification, the maintenance bill that always killed it on the frontend collapses toward zero.

Speaker
Ara · Anthropic Applied AI
Recorded
2026-05-23 · 31:43
Verified
fact-check-loop · 3 rounds
— The Argument in Brief

This is not a new technology. It is an economic inversion letting an old, obviously-correct pattern finally land in places it could never afford to live before.

01

DOM emits its own state

Each component prints data-verify-* attributes on its outermost element. The agent stops reverse-engineering rendered HTML and just reads attributes.

02

Spec lives with code

Fixtures, invariants and probes ship in .verify.ts alongside the component. The verification matrix becomes the component's living documentation.

03

Three surfaces, one path

Human dashboard, agent __verify.runAll(), CI vitest matrix — all call the same runFixture(). No room for drift between consumers.

§I
Diagnosis · The Reality of Frontend Testing

Open the ledger first. The state of frontend testing in 2026.

Before talking about "AI making verification feasible," it is worth admitting that on most frontend teams, strict verification was never really there in the first place. Treating contract testing as something teams already do is the most common misreading of this story.

Pull a few of the larger 2024–2025 surveys together and the picture is consistent: unit tests are sparse, end-to-end barely exists, and coverage sits well below what people will admit out loud. The teams that do test cite maintenance as their biggest tax.

34%Reflect / YC
In a survey of 46 YC technical founders, 34% wrote no unit tests at all. "End-to-end testing: most respondents did none."
Reflect Survey · YC Companies
67%mabl 2024
Among 500+ respondents, 67% report ≤ 60% coverage; 1-in-5 teams sit below 20%. Test maintenance is the most time-consuming task at 21%.
2024 State of Testing in DevOps
42%PractiTest 2024
42% of testers don't feel confident writing automation scripts. Automation is growing, but the reskilling gap is widening.
PractiTest · State of Testing 2024
6028State of Frontend
6,028 frontend engineers surveyed (about half European). The ideal testing pyramid "remains aspirational for many teams"; component and integration testing are where teams underinvest most.
State of Frontend 2024
§II
The Old Solution · And Why It Stalled

Contract testing was always theoretically right. On the frontend, no one could afford it.

Contract testing as a concept is old. Pact, schema-based contracts, consumer-driven contracts between microservices — these have been around for years. The problem was never the idea; it was the cost-curve. Value lands after deployment (the bug that didn't ship), cost lands during development (writing fixtures, evolving invariants, debugging flakiness).

That mismatch is harshest at the UI layer, where requirements churn weekly. Listen to the practitioners who tried:

"

The biggest issue seems to be setup and maintenance—it takes effort, and if teams on both sides of the contract aren't fully committed, it quickly becomes useless.

alonat.tech · 2025-01
"

Some folks found it so cumbersome that it never even caught meaningful bugs, making it feel like a wasted investment.

alonat.tech · 2025-01
"

At the last client I worked at, it took me almost a year of my "free time" at work to get one suite of flakey full stack Cucumber tests running reliably!

PactFlow blog · 2020
"

API contracts are formal, automated, and reassuring. When they pass, pipelines turn green and deployments feel justified. The problem is that contracts answer a very specific question, and it's not the one users care about.

neteye-blog.com · 2025-12

Read the four together and the conclusion is unflattering: in the human era, verification infrastructure was never about feasibility — it was about whether anyone could afford to keep it alive. Most teams couldn't, and many of those who tried got eaten by the maintenance bill.

The more capable the models get, the more you should try to resist constraining them.

Ara · Code with Claude London 2026 · Workshop "How We Claude Code"
§III
Idea 1 · The Surface, Not the Source

Hang component state on the HTML surface. The agent reads attributes, not pixels.

Traditionally, when an agent wants to know what a component is currently showing, it has to read the rendered HTML, CSS and visible text and reverse-engineer state from there. That is a two-step lossy translation: the renderer turns props into DOM, then the agent has to turn DOM back into "what the app thinks it's showing."

Anthropic's move is to skip the second step. Components emit semantic state directly to DOM attributes: data-verify-unit="TodoApp", data-verify-total="3", data-verify-done="1". These are not render byproducts — they are explicit contracts.

Specimen · TodoApp.tsx data-verify-* attributes emitted on render
<section
data-verify-unit="TodoApp"
data-verify-total="3"
data-verify-done="1"
data-verify-active="2"
data-verify-filter="all">
  
</section>
The same section that renders the to-do list also publishes — on its outermost element — the state it believes it is in. The agent just queries the DOM.

"Isn't this just data-testid?" — they look similar; their intent is not. data-testid assigns an identity to an element ("who am I"). data-verify-* exposes a full state contract ("who am I, and what am I currently showing"). One is a locator, the other is a contract.

Dimension
data-testid (traditional)
data-verify-* (agent-native)
What it exposes
Element identity — "who am I"
Full state contract — "who I am + my state"
Verification logic
Lives in standalone test files
Lives next to the component (.verify.ts)
Source of truth
Hard-coded in test scripts by humans
Declarative fixtures + Zod schema
Consumer
Human-written test scripts
Agent API __verify.runAll()
Maintainer
Humans — drifts, decays, rots
Agent — marginal cost near zero
When it speaks
Only when tests run
State emitted live, on every render
§IV
Idea 2 · The 6-line Mechanism

The whole contract sits on top of six lines of code.

Strip verifyAttrs() down and what it does is unremarkable: iterate the object, prepend data-verify- to each key, stringify each value. That's it. The cleverness isn't in the function — it's in the convention the function makes cheap to follow.

01 ·SOURCE
export function verifyAttrs( attrs: Record<string, string | number | boolean | null | undefined> ): Record<string, string> { const out: Record<string, string> = {}; for (const [key, value] of Object.entries(attrs)) { if (value === null || value === undefined) continue; out[`${VERIFY_PREFIX}${key}`] = String(value); } return out; }

That's the entire mechanism. The value isn't in the lines — it is in the convention every component in the codebase agrees to follow. Six lines replace what would otherwise be hundreds of lines of testing utility.

02 ·USAGE IN A COMPONENT
// inside TodoApp.tsx return ( <section {...verifyAttrs({ unit: "TodoApp", total: stats.total, done: stats.done, active: stats.active, filter: state.filter, })} > {/* render */} </section> );

Each component pays one line — a spread onto the outermost element. After that, React does the work: every render, the latest state lands on data-verify-* automatically.

§V
Idea 3 · One Source, Three Surfaces

One verification result. Three consumers. One code path.

The single source of truth is a function called runFixture(). Given a unit and a fixture, it returns one of four verdicts: PASS | FAIL | BLOCKED | SKIP. Three different consumers call the same function and shape the result differently:

Dashboard Human ·
TodoStats / mixedPASS
TodoStats / none-donePASS
TodoStats / all-donePASS
🔍inconsistent-countsFAIL
TodoInput / typedPASS
TodoList / populatedPASS
A "Run all" button in the browser — verdict grid for human eyes.
Agent API Machine ·
{
"unit": "TodoStats",
"fixture": "mixed",
"verdict": "PASS",
"checks": [
{"id":"schema","ok":true},
{"id":"invariants","ok":true}
]
}
await __verify.runAll() — structured JSON the agent can act on.
CI Headless Pipeline ·
TodoStats > mixed
TodoStats > none-done
TodoStats > all-done
✓ 🔍 inconsistent-counts
→ FAIL (by design)
TodoInput > typed
TodoList > populated

21 fixtures · all expected verdicts
bun run verify — vitest matrix, the CI traffic-light.

Because all three surfaces share one code path, they cannot drift. The dashboard cannot disagree with CI. There is no scenario where one passes and another fails on the same fixture. That is the anti-integration-hell move: you don't get to ship something that's "green here, red there."

§VI
Run · The Demo Repo, Live

Demo repo, real output. Watch for the deliberately failing probe.

The companion repo is cwc-workshops/how-we-claude-code — five verifiable units, twenty-one fixtures. Cloning it and running bun run verify produces:

~/cwc-workshops/how-we-claude-code/phase-3-verify · bun run verify
matrix · 27 expected verdicts
TodoItem > active ················································ PASS
TodoItem > done ·················································· PASS
TodoItem > 🔍 empty-text ········································· PASS
TodoItem > 🔍 long-text ·········································· PASS
TodoInput > empty ················································· PASS
TodoInput > typed ················································· PASS
TodoInput > 🔍 whitespace-only ····································· PASS
TodoList > empty ·················································· PASS
TodoList > populated ·············································· PASS
TodoList > all-done ··············································· PASS
TodoList > 🔍 long-text ·········································· PASS
TodoStats > mixed ················································· PASS
TodoStats > none-done ············································· PASS
TodoStats > all-done ·············································· PASS
TodoStats > 🔍 inconsistent-counts ······························· FAIL (by design)
todos.feature > fresh ············································· PASS
todos.feature > pre-seeded ········································ PASS
todos.feature > add-then-verify ··································· PASS
todos.feature > add-then-toggle ··································· PASS
todos.feature > 🔍 filter-active ································ PASS
todos.feature > 🔍 whitespace-submit ···························· PASS
Test Files 1 passed (1)
Tests 27 passed (27)
21 fixtures, 6 of them probes (🔍). The vitest matrix runs 27 items in total — including structural checks.
About inconsistent-counts's FAIL: this is a probe — a fixture deliberately constructed to violate an invariant (total = 10, done = 3, active = 4; 3 + 4 ≠ 10). The matrix's EXPECTED_FAIL set asserts that this fixture must fail. The point of probes is to verify the verifier itself. A run where everything passes shouldn't reassure you — it should make you suspicious.
§VII
Why Now · The Cost Inversion

The pattern was always there. The agent re-priced it.

Lay each stage of the verification lifecycle next to its human cost and its agent cost, and the picture is stark. The right reading isn't "AI replaces humans" — it is that a budget item that was always written off as too expensive has just re-entered the budget.

Human Eraverification as cost center
Agent Eraverification as infrastructure
writing fixtures
Cover edge cases for every component. Repetitive labour, easy to miss.
High · grinding work
Derive from spec or code automatically. Agents are also better at exhaustive enumeration.
Near-zero
maintaining invariants
Requirements change; invariants don't keep up; over time they quietly become lies.
High · long-term drift
When code changes, the agent updates .verify.ts in the same pass. The matrix becomes living docs.
Sync cost only
running
Slow, flaky, requires a human to interpret the report.
Medium · CI time + triage
Seconds, deterministic, structured JSON output.
Effectively free
fixing failures
Human switches context, reads the report, edits code, runs again.
High · context switching
Agent reads JSON, locates the issue, edits, re-verifies, closes the loop.
Loop time only

The crucial shift is this: verification used to be a one-time investment (write it once, rarely revisit) and now becomes continuously maintained living documentation. The first was a rational compromise in the human era — continuous maintenance didn't pencil out. The second is the natural shape in the agent era, because continuous maintenance no longer requires a continuous human bill.

§VIII
The Whole Pattern · Six Ideas, in Brief

Six component-level pieces. Together they make the architecture.

We've now covered three: the surface, the mechanism, the three consumers. Here are all six in one place — they interlock, and you lose a capability if you drop any one.

1
DOM as the machine-readable surface
Components attach data-verify-* to their outermost element, exposing internal state as a contract on the HTML surface.
2
Components declare fixtures + invariants
A sibling .verify.ts file: Zod schema, named fixtures, invariant predicates, deliberately-failing probes.
3
Isolated render targets
Each unit × fixture has its own URL: /verify/:unit/:fixture. No app shell, drivable on its own.
4
Pluggable verifiers
Schema, invariants, dom-contract and a11y — four families. Add a new file to register a new verifier; components don't change.
5
window.__verify — agent handle
manifest(), current(), runAll() — a structured surface for agents, sharing a code path with the dashboard.
6
Unified verdict taxonomy
PASS / FAIL / BLOCKED / SKIP — with BLOCKED ≠ FAIL, distinguishing "could not observe" from "observed something wrong."

The agents are going to be doing more and more of this natively, and how can you set the artifacts that you produce up to natively be testable and verifiable in the way that you need.

Ara · Code with Claude London 2026
§IX
Evidence · Replay-as-PR-Artifact

Once verification passes, record a clip. Attach it to the PR.

The repo also ships scripts/record.ts: launch headed Chromium, navigate to /verify/replay, wait for window.__verify_replay.done === true, and save the whole verification run to a .webm file in recordings/.

The script itself isn't remarkable. The workflow it enables is: developer ships code → agent runs verification → agent records video → PR includes the recording. Reviewers don't open a browser; they watch thirty seconds of replay and decide whether the behaviour is right.

The Claude Code team records basically all the code changes that they do like this, all the front end changes at least, especially at the pace of shipping that we have at the moment.

Ara · describing Anthropic's internal workflow
§X
Boundaries · Where the Pattern Doesn't Apply

This is not a silver bullet.

The report itself doesn't make universal claims. The visible limits — the ones the demo wears on its sleeve — are these:

i
Frontend components only. Backend logic, data pipelines, service orchestration — they have no rendered surface to attach data-verify-* to. They need a different contract.
ii
Developers must opt in. Each component takes one explicit verifyAttrs() spread. The pattern is not invisible — it is intrusive but cheap.
iii
State consistency, not visual correctness. Misaligned CSS, wrong colours, broken layouts — none of that is what data-verify-* covers. You still need humans or visual regression for that.
iv
Probe quality is human-judged. The repo requires at least one probe per unit, but whether that probe actually exercises the boundary is still a design call no one can automate.
v
No standardisation yet. So far this is Anthropic's internal practice plus one demo repo. There is no RFC, no third-party adoption data, no community consensus.
§XI
Closing · What Anthropic Actually Contributed

Not a new technology. A protocol layer, made concrete.

Contract testing was already there. Data attributes were already there. React turning state into DOM is older than ten years. Anthropic invented none of it. What they did was thread these familiar pieces into a single agent-friendly protocol, and then ship a working reference implementation.

The point of the demo isn't "the to-do app works." The point is that, when agents start writing code routinely, "can the agent verify what it just wrote" stops being optional and becomes infrastructure. This repo draws the first line of that contract — between an agent and a frontend artifact. It probably isn't the final line, but it opens the room.

What's left is whether the industry catches up: whether data-verify-* and __verify become defaults at the framework or design-system layer, instead of every team rolling their own from scratch.

What to take away
Three things
  • Verification is no longer "humans write tests." It is "agents self-attest." Components emit state to DOM at render-time; the agent reads attributes.
  • One verification logic feeds three surfaces. Dashboard, agent API, and CI vitest all share runFixture() — there is no room to drift apart.
  • Probes are the verifier's immune system. Deliberately-failing fixtures prove the system isn't lying. A run where everything passes should make you suspicious.
What not to take away
Three things
  • "AI solved testing." It didn't. Visual correctness, performance regressions, cross-browser parity — none of those live inside data-verify-*.
  • "This is now an industry standard." It is not. It is one company's internal practice plus a reference repo, with no community validation behind it yet.
  • "Zero-cost / zero-touch." Each component spreads verifyAttrs() by hand; each unit gets its own .verify.ts. The cost is low — but it is not nothing.