Field Note № 08 · Verifiable Components

The DOM grows
a machine-readable
surface⟨at last⟩.

data-verify-doc="agent-native-verification" data-verify-rounds="3" data-verify-status="ALL_PASS"

Contract testing is not new. Fixtures, invariants, schemas — the pattern has lived in textbooks for two decades, and only a handful of backend teams ever made it routine. What is new is the cost structure: with an agent willing to write, run and rewrite the verification, the maintenance bill that always killed it on the frontend collapses toward zero.

Source: Code with Claude London 2026
Speaker: Ara · Anthropic Applied AI
Recorded: 2026-05-23 · 31:43
Verified: fact-check-loop · 3 rounds

— The Argument in Brief

This is not a new technology. It is an economic inversion letting an old, obviously-correct pattern finally land in places it could never afford to live before.

DOM emits its own state

Each component prints data-verify-* attributes on its outermost element. The agent stops reverse-engineering rendered HTML and just reads attributes.

Spec lives with code

Fixtures, invariants and probes ship in .verify.ts alongside the component. The verification matrix becomes the component's living documentation.

Three surfaces, one path

Human dashboard, agent __verify.runAll(), CI vitest matrix — all call the same runFixture(). No room for drift between consumers.

§I

Diagnosis · The Reality of Frontend Testing

Open the ledger first. The state of frontend testing in 2026.

Before talking about "AI making verification feasible," it is worth admitting that on most frontend teams, strict verification was never really there in the first place. Treating contract testing as something teams already do is the most common misreading of this story.

Pull a few of the larger 2024–2025 surveys together and the picture is consistent: unit tests are sparse, end-to-end barely exists, and coverage sits well below what people will admit out loud. The teams that do test cite maintenance as their biggest tax.

34%Reflect / YC

In a survey of 46 YC technical founders, 34% wrote no unit tests at all. "End-to-end testing: most respondents did none."

Reflect Survey · YC Companies

67%mabl 2024

Among 500+ respondents, 67% report ≤ 60% coverage; 1-in-5 teams sit below 20%. Test maintenance is the most time-consuming task at 21%.

2024 State of Testing in DevOps

42%PractiTest 2024

42% of testers don't feel confident writing automation scripts. Automation is growing, but the reskilling gap is widening.

PractiTest · State of Testing 2024

6028State of Frontend

6,028 frontend engineers surveyed (about half European). The ideal testing pyramid "remains aspirational for many teams"; component and integration testing are where teams underinvest most.

State of Frontend 2024

§II

The Old Solution · And Why It Stalled

Contract testing was always theoretically right. On the frontend, no one could afford it.

Contract testing as a concept is old. Pact, schema-based contracts, consumer-driven contracts between microservices — these have been around for years. The problem was never the idea; it was the cost-curve. Value lands after deployment (the bug that didn't ship), cost lands during development (writing fixtures, evolving invariants, debugging flakiness).

That mismatch is harshest at the UI layer, where requirements churn weekly. Listen to the practitioners who tried:

The biggest issue seems to be setup and maintenance—it takes effort, and if teams on both sides of the contract aren't fully committed, it quickly becomes useless.

alonat.tech · 2025-01

Some folks found it so cumbersome that it never even caught meaningful bugs, making it feel like a wasted investment.

alonat.tech · 2025-01

At the last client I worked at, it took me almost a year of my "free time" at work to get one suite of flakey full stack Cucumber tests running reliably!

PactFlow blog · 2020

API contracts are formal, automated, and reassuring. When they pass, pipelines turn green and deployments feel justified. The problem is that contracts answer a very specific question, and it's not the one users care about.

neteye-blog.com · 2025-12

Read the four together and the conclusion is unflattering: in the human era, verification infrastructure was never about feasibility — it was about whether anyone could afford to keep it alive. Most teams couldn't, and many of those who tried got eaten by the maintenance bill.

The more capable the models get, the more you should try to resist constraining them.

Ara · Code with Claude London 2026 · Workshop "How We Claude Code"

§III

Idea 1 · The Surface, Not the Source

Hang component state on the HTML surface. The agent reads attributes, not pixels.

Traditionally, when an agent wants to know what a component is currently showing, it has to read the rendered HTML, CSS and visible text and reverse-engineer state from there. That is a two-step lossy translation: the renderer turns props into DOM, then the agent has to turn DOM back into "what the app thinks it's showing."

Anthropic's move is to skip the second step. Components emit semantic state directly to DOM attributes: data-verify-unit="TodoApp", data-verify-total="3", data-verify-done="1". These are not render byproducts — they are explicit contracts.

Specimen · TodoApp.tsx data-verify-* attributes emitted on render

The same section that renders the to-do list also publishes — on its outermost element — the state it believes it is in. The agent just queries the DOM.

"Isn't this just data-testid?" — they look similar; their intent is not. data-testid assigns an identity to an element ("who am I"). data-verify-* exposes a full state contract ("who am I, and what am I currently showing"). One is a locator, the other is a contract.

Dimension

data-testid (traditional)

data-verify-* (agent-native)

What it exposes

Element identity — "who am I"

Full state contract — "who I am + my state"

Verification logic

Lives in standalone test files

Lives next to the component (.verify.ts)

Source of truth

Hard-coded in test scripts by humans

Declarative fixtures + Zod schema

Consumer

Human-written test scripts

Agent API __verify.runAll()

Maintainer

Humans — drifts, decays, rots

Agent — marginal cost near zero

When it speaks

Only when tests run

State emitted live, on every render

§IV

Idea 2 · The 6-line Mechanism

The whole contract sits on top of six lines of code.

Strip verifyAttrs() down and what it does is unremarkable: iterate the object, prepend data-verify- to each key, stringify each value. That's it. The cleverness isn't in the function — it's in the convention the function makes cheap to follow.

01 ·SOURCE

export function verifyAttrs(
  attrs: Record<string, string | number | boolean | null | undefined>
): Record<string, string> {
  const out: Record<string, string> = {};
  for (const [key, value] of Object.entries(attrs)) {
    if (value === null || value === undefined) continue;
    out[`${VERIFY_PREFIX}${key}`] = String(value);
  }
  return out;
}

That's the entire mechanism. The value isn't in the lines — it is in the convention every component in the codebase agrees to follow. Six lines replace what would otherwise be hundreds of lines of testing utility.

02 ·USAGE IN A COMPONENT

// inside TodoApp.tsx
return (
  <section
    {...verifyAttrs({
      unit: "TodoApp",
      total: stats.total,
      done: stats.done,
      active: stats.active,
      filter: state.filter,
    })}
  >
    {/* render */}
  </section>
);

Each component pays one line — a spread onto the outermost element. After that, React does the work: every render, the latest state lands on data-verify-* automatically.

§V

Idea 3 · One Source, Three Surfaces

One verification result. Three consumers. One code path.

The single source of truth is a function called runFixture(). Given a unit and a fixture, it returns one of four verdicts: PASS | FAIL | BLOCKED | SKIP. Three different consumers call the same function and shape the result differently:

Dashboard Human ·

TodoStats / mixedPASS

TodoStats / none-donePASS

TodoStats / all-donePASS

🔍inconsistent-countsFAIL

TodoInput / typedPASS

TodoList / populatedPASS

A "Run all" button in the browser — verdict grid for human eyes.

Agent API Machine ·

{
"unit": "TodoStats",
"fixture": "mixed",
"verdict": "PASS",
"checks": [
{"id":"schema","ok":true},
{"id":"invariants","ok":true}
]
}

await __verify.runAll() — structured JSON the agent can act on.

CI Headless Pipeline ·

✓ TodoStats > mixed
✓ TodoStats > none-done
✓ TodoStats > all-done
✓ 🔍 inconsistent-counts
→ FAIL (by design)
✓ TodoInput > typed
✓ TodoList > populated

21 fixtures · all expected verdicts

bun run verify — vitest matrix, the CI traffic-light.

Because all three surfaces share one code path, they cannot drift. The dashboard cannot disagree with CI. There is no scenario where one passes and another fails on the same fixture. That is the anti-integration-hell move: you don't get to ship something that's "green here, red there."

§VI

Run · The Demo Repo, Live

Demo repo, real output. Watch for the deliberately failing probe.

The companion repo is cwc-workshops/how-we-claude-code — five verifiable units, twenty-one fixtures. Cloning it and running bun run verify produces:

~/cwc-workshops/how-we-claude-code/phase-3-verify · bun run verify

✔ matrix · 27 expected verdicts

✓ TodoItem > active ················································ PASS

✓ TodoItem > done ·················································· PASS

✓ TodoItem > 🔍 empty-text ········································· PASS

✓ TodoItem > 🔍 long-text ·········································· PASS

✓ TodoInput > empty ················································· PASS

✓ TodoInput > typed ················································· PASS

✓ TodoInput > 🔍 whitespace-only ····································· PASS

✓ TodoList > empty ·················································· PASS

✓ TodoList > populated ·············································· PASS

✓ TodoList > all-done ··············································· PASS

✓ TodoList > 🔍 long-text ·········································· PASS

✓ TodoStats > mixed ················································· PASS

✓ TodoStats > none-done ············································· PASS

✓ TodoStats > all-done ·············································· PASS

✓ TodoStats > 🔍 inconsistent-counts ······························· FAIL (by design)

✓ todos.feature > fresh ············································· PASS

✓ todos.feature > pre-seeded ········································ PASS

✓ todos.feature > add-then-verify ··································· PASS

✓ todos.feature > add-then-toggle ··································· PASS

✓ todos.feature > 🔍 filter-active ································ PASS

✓ todos.feature > 🔍 whitespace-submit ···························· PASS

Test Files 1 passed (1)

Tests 27 passed (27)

21 fixtures, 6 of them probes (🔍). The vitest matrix runs 27 items in total — including structural checks.

About inconsistent-counts's FAIL: this is a probe — a fixture deliberately constructed to violate an invariant (total = 10, done = 3, active = 4; 3 + 4 ≠ 10). The matrix's EXPECTED_FAIL set asserts that this fixture must fail. The point of probes is to verify the verifier itself. A run where everything passes shouldn't reassure you — it should make you suspicious.

§VII

Why Now · The Cost Inversion

The pattern was always there. The agent re-priced it.

Lay each stage of the verification lifecycle next to its human cost and its agent cost, and the picture is stark. The right reading isn't "AI replaces humans" — it is that a budget item that was always written off as too expensive has just re-entered the budget.

Human Eraverification as cost center

Agent Eraverification as infrastructure

writing fixtures

Cover edge cases for every component. Repetitive labour, easy to miss.

High · grinding work

Derive from spec or code automatically. Agents are also better at exhaustive enumeration.

Near-zero

maintaining invariants

Requirements change; invariants don't keep up; over time they quietly become lies.

High · long-term drift

When code changes, the agent updates .verify.ts in the same pass. The matrix becomes living docs.

Sync cost only

running

Slow, flaky, requires a human to interpret the report.

Medium · CI time + triage

Seconds, deterministic, structured JSON output.

Effectively free

fixing failures

Human switches context, reads the report, edits code, runs again.

High · context switching

Agent reads JSON, locates the issue, edits, re-verifies, closes the loop.

Loop time only

The crucial shift is this: verification used to be a one-time investment (write it once, rarely revisit) and now becomes continuously maintained living documentation. The first was a rational compromise in the human era — continuous maintenance didn't pencil out. The second is the natural shape in the agent era, because continuous maintenance no longer requires a continuous human bill.

§VIII

The Whole Pattern · Six Ideas, in Brief

Six component-level pieces. Together they make the architecture.

We've now covered three: the surface, the mechanism, the three consumers. Here are all six in one place — they interlock, and you lose a capability if you drop any one.

DOM as the machine-readable surface

Components attach data-verify-* to their outermost element, exposing internal state as a contract on the HTML surface.

Components declare fixtures + invariants

A sibling .verify.ts file: Zod schema, named fixtures, invariant predicates, deliberately-failing probes.

Isolated render targets

Each unit × fixture has its own URL: /verify/:unit/:fixture. No app shell, drivable on its own.

Pluggable verifiers

Schema, invariants, dom-contract and a11y — four families. Add a new file to register a new verifier; components don't change.

window.__verify — agent handle

manifest(), current(), runAll() — a structured surface for agents, sharing a code path with the dashboard.

Unified verdict taxonomy

PASS / FAIL / BLOCKED / SKIP — with BLOCKED ≠ FAIL, distinguishing "could not observe" from "observed something wrong."

The agents are going to be doing more and more of this natively, and how can you set the artifacts that you produce up to natively be testable and verifiable in the way that you need.

Ara · Code with Claude London 2026

§IX

Evidence · Replay-as-PR-Artifact

Once verification passes, record a clip. Attach it to the PR.

The repo also ships scripts/record.ts: launch headed Chromium, navigate to /verify/replay, wait for window.__verify_replay.done === true, and save the whole verification run to a .webm file in recordings/.

The script itself isn't remarkable. The workflow it enables is: developer ships code → agent runs verification → agent records video → PR includes the recording. Reviewers don't open a browser; they watch thirty seconds of replay and decide whether the behaviour is right.

The Claude Code team records basically all the code changes that they do like this, all the front end changes at least, especially at the pace of shipping that we have at the moment.

Ara · describing Anthropic's internal workflow

§X

Boundaries · Where the Pattern Doesn't Apply

This is not a silver bullet.

The report itself doesn't make universal claims. The visible limits — the ones the demo wears on its sleeve — are these:

Frontend components only. Backend logic, data pipelines, service orchestration — they have no rendered surface to attach data-verify-* to. They need a different contract.

Developers must opt in. Each component takes one explicit verifyAttrs() spread. The pattern is not invisible — it is intrusive but cheap.

iii

State consistency, not visual correctness. Misaligned CSS, wrong colours, broken layouts — none of that is what data-verify-* covers. You still need humans or visual regression for that.

Probe quality is human-judged. The repo requires at least one probe per unit, but whether that probe actually exercises the boundary is still a design call no one can automate.

No standardisation yet. So far this is Anthropic's internal practice plus one demo repo. There is no RFC, no third-party adoption data, no community consensus.

§XI

Closing · What Anthropic Actually Contributed

Not a new technology. A protocol layer, made concrete.

Contract testing was already there. Data attributes were already there. React turning state into DOM is older than ten years. Anthropic invented none of it. What they did was thread these familiar pieces into a single agent-friendly protocol, and then ship a working reference implementation.

The point of the demo isn't "the to-do app works." The point is that, when agents start writing code routinely, "can the agent verify what it just wrote" stops being optional and becomes infrastructure. This repo draws the first line of that contract — between an agent and a frontend artifact. It probably isn't the final line, but it opens the room.

What's left is whether the industry catches up: whether data-verify-* and __verify become defaults at the framework or design-system layer, instead of every team rolling their own from scratch.

What to take away

Three things

Verification is no longer "humans write tests." It is "agents self-attest." Components emit state to DOM at render-time; the agent reads attributes.
One verification logic feeds three surfaces. Dashboard, agent API, and CI vitest all share runFixture() — there is no room to drift apart.
Probes are the verifier's immune system. Deliberately-failing fixtures prove the system isn't lying. A run where everything passes should make you suspicious.

What not to take away