The DOM grows
a machine-readable
surface⟨at last⟩.
Contract testing is not new. Fixtures, invariants, schemas — the pattern has lived in textbooks for two decades, and only a handful of backend teams ever made it routine. What is new is the cost structure: with an agent willing to write, run and rewrite the verification, the maintenance bill that always killed it on the frontend collapses toward zero.
This is not a new technology. It is an economic inversion letting an old, obviously-correct pattern finally land in places it could never afford to live before.
DOM emits its own state
Each component prints data-verify-* attributes on its outermost element. The agent stops reverse-engineering rendered HTML and just reads attributes.
Spec lives with code
Fixtures, invariants and probes ship in .verify.ts alongside the component. The verification matrix becomes the component's living documentation.
Three surfaces, one path
Human dashboard, agent __verify.runAll(), CI vitest matrix — all call the same runFixture(). No room for drift between consumers.
Before talking about "AI making verification feasible," it is worth admitting that on most frontend teams, strict verification was never really there in the first place. Treating contract testing as something teams already do is the most common misreading of this story.
Pull a few of the larger 2024–2025 surveys together and the picture is consistent: unit tests are sparse, end-to-end barely exists, and coverage sits well below what people will admit out loud. The teams that do test cite maintenance as their biggest tax.
Contract testing as a concept is old. Pact, schema-based contracts, consumer-driven contracts between microservices — these have been around for years. The problem was never the idea; it was the cost-curve. Value lands after deployment (the bug that didn't ship), cost lands during development (writing fixtures, evolving invariants, debugging flakiness).
That mismatch is harshest at the UI layer, where requirements churn weekly. Listen to the practitioners who tried:
The biggest issue seems to be setup and maintenance—it takes effort, and if teams on both sides of the contract aren't fully committed, it quickly becomes useless.
Some folks found it so cumbersome that it never even caught meaningful bugs, making it feel like a wasted investment.
At the last client I worked at, it took me almost a year of my "free time" at work to get one suite of flakey full stack Cucumber tests running reliably!
API contracts are formal, automated, and reassuring. When they pass, pipelines turn green and deployments feel justified. The problem is that contracts answer a very specific question, and it's not the one users care about.
Read the four together and the conclusion is unflattering: in the human era, verification infrastructure was never about feasibility — it was about whether anyone could afford to keep it alive. Most teams couldn't, and many of those who tried got eaten by the maintenance bill.
The more capable the models get, the more you should try to resist constraining them.
Ara · Code with Claude London 2026 · Workshop "How We Claude Code"
Traditionally, when an agent wants to know what a component is currently showing, it has to read the rendered HTML, CSS and visible text and reverse-engineer state from there. That is a two-step lossy translation: the renderer turns props into DOM, then the agent has to turn DOM back into "what the app thinks it's showing."
Anthropic's move is to skip the second step. Components emit semantic state directly to DOM attributes: data-verify-unit="TodoApp", data-verify-total="3", data-verify-done="1". These are not render byproducts — they are explicit contracts.
data-verify-unit="TodoApp"
data-verify-total="3"
data-verify-done="1"
data-verify-active="2"
data-verify-filter="all">
…
</section>
section that renders the to-do list also publishes — on its outermost element — the state it believes it is in. The agent just queries the DOM."Isn't this just data-testid?" — they look similar; their intent is not. data-testid assigns an identity to an element ("who am I"). data-verify-* exposes a full state contract ("who am I, and what am I currently showing"). One is a locator, the other is a contract.
.verify.ts)__verify.runAll()Strip verifyAttrs() down and what it does is unremarkable: iterate the object, prepend data-verify- to each key, stringify each value. That's it. The cleverness isn't in the function — it's in the convention the function makes cheap to follow.
That's the entire mechanism. The value isn't in the lines — it is in the convention every component in the codebase agrees to follow. Six lines replace what would otherwise be hundreds of lines of testing utility.
Each component pays one line — a spread onto the outermost element. After that, React does the work: every render, the latest state lands on data-verify-* automatically.
The single source of truth is a function called runFixture(). Given a unit and a fixture, it returns one of four verdicts: PASS | FAIL | BLOCKED | SKIP. Three different consumers call the same function and shape the result differently:
"unit": "TodoStats",
"fixture": "mixed",
"verdict": "PASS",
"checks": [
{"id":"schema","ok":true},
{"id":"invariants","ok":true}
]
}
await __verify.runAll() — structured JSON the agent can act on.✓ TodoStats > none-done
✓ TodoStats > all-done
✓ 🔍 inconsistent-counts
→ FAIL (by design)
✓ TodoInput > typed
✓ TodoList > populated
21 fixtures · all expected verdicts
bun run verify — vitest matrix, the CI traffic-light.Because all three surfaces share one code path, they cannot drift. The dashboard cannot disagree with CI. There is no scenario where one passes and another fails on the same fixture. That is the anti-integration-hell move: you don't get to ship something that's "green here, red there."
The companion repo is cwc-workshops/how-we-claude-code — five verifiable units, twenty-one fixtures. Cloning it and running bun run verify produces:
inconsistent-counts's FAIL:
this is a probe — a fixture deliberately constructed to violate an invariant
(total = 10, done = 3, active = 4; 3 + 4 ≠ 10).
The matrix's EXPECTED_FAIL set asserts that this fixture must fail.
The point of probes is to verify the verifier itself. A run where
everything passes shouldn't reassure you — it should make you suspicious.
Lay each stage of the verification lifecycle next to its human cost and its agent cost, and the picture is stark. The right reading isn't "AI replaces humans" — it is that a budget item that was always written off as too expensive has just re-entered the budget.
.verify.ts in the same pass. The matrix becomes living docs.The crucial shift is this: verification used to be a one-time investment (write it once, rarely revisit) and now becomes continuously maintained living documentation. The first was a rational compromise in the human era — continuous maintenance didn't pencil out. The second is the natural shape in the agent era, because continuous maintenance no longer requires a continuous human bill.
We've now covered three: the surface, the mechanism, the three consumers. Here are all six in one place — they interlock, and you lose a capability if you drop any one.
data-verify-* to their outermost element, exposing internal state as a contract on the HTML surface..verify.ts file: Zod schema, named fixtures, invariant predicates, deliberately-failing probes./verify/:unit/:fixture. No app shell, drivable on its own.manifest(), current(), runAll() — a structured surface for agents, sharing a code path with the dashboard.The agents are going to be doing more and more of this natively, and how can you set the artifacts that you produce up to natively be testable and verifiable in the way that you need.
Ara · Code with Claude London 2026
The repo also ships scripts/record.ts: launch headed Chromium, navigate to /verify/replay, wait for window.__verify_replay.done === true, and save the whole verification run to a .webm file in recordings/.
The script itself isn't remarkable. The workflow it enables is: developer ships code → agent runs verification → agent records video → PR includes the recording. Reviewers don't open a browser; they watch thirty seconds of replay and decide whether the behaviour is right.
The Claude Code team records basically all the code changes that they do like this, all the front end changes at least, especially at the pace of shipping that we have at the moment.
Ara · describing Anthropic's internal workflow
The report itself doesn't make universal claims. The visible limits — the ones the demo wears on its sleeve — are these:
data-verify-* to. They need a different contract.verifyAttrs() spread. The pattern is not invisible — it is intrusive but cheap.data-verify-* covers. You still need humans or visual regression for that.Contract testing was already there. Data attributes were already there. React turning state into DOM is older than ten years. Anthropic invented none of it. What they did was thread these familiar pieces into a single agent-friendly protocol, and then ship a working reference implementation.
The point of the demo isn't "the to-do app works." The point is that, when agents start writing code routinely, "can the agent verify what it just wrote" stops being optional and becomes infrastructure. This repo draws the first line of that contract — between an agent and a frontend artifact. It probably isn't the final line, but it opens the room.
What's left is whether the industry catches up: whether data-verify-* and __verify become defaults at the framework or design-system layer, instead of every team rolling their own from scratch.
- Verification is no longer "humans write tests." It is "agents self-attest." Components emit state to DOM at render-time; the agent reads attributes.
- One verification logic feeds three surfaces. Dashboard, agent API, and CI vitest all share
runFixture()— there is no room to drift apart. - Probes are the verifier's immune system. Deliberately-failing fixtures prove the system isn't lying. A run where everything passes should make you suspicious.
- "AI solved testing." It didn't. Visual correctness, performance regressions, cross-browser parity — none of those live inside
data-verify-*. - "This is now an industry standard." It is not. It is one company's internal practice plus a reference repo, with no community validation behind it yet.
- "Zero-cost / zero-touch." Each component spreads
verifyAttrs()by hand; each unit gets its own.verify.ts. The cost is low — but it is not nothing.