Over the past two months I ran the same fact-check loop on 11 reports: a fresh subagent re-reads every claim against the source in a clean context, I fix whatever it flags, then another fresh subagent re-checks, until zero issues remain. Every round got written into the project's STATUS file as it happened. These aren't benchmark numbers. They're a ledger that piled up inside real work.
The x-axis lays the 11 reports out by date; the y-axis is how many rounds each one took to reach "all pass." The first six bounce around 3-4 rounds. Past the 05-29 line, they hug "one round."
Every round count comes from that report's own STATUS record, not assembled after the fact. Which round caught how many issues, whether it was a factual error or imprecise wording, was written down at the time.
| Report | Rounds | ||
|---|---|---|---|
| Failure modes of AI-native startups 05-15 |
3 | ||
| Claude Code on large codebases 05-21 |
4 | ||
| Anthropic rebuilds its sales org 05-22 |
3 | ||
| Agent-friendly CLI design 05-26 |
3 | ||
| Agent-native verification 05-26 |
3 | ||
| Zero Trust for AI agents 05-28 |
3 | ||
| ━━━━ SWITCHED TO OPUS 4.8 · 2026-05-29 ━━━━ | |||
| Securing source code with LLMs 05-29 |
1 | ||
| Dynamic workflows 05-30 |
1 | ||
| How we contain Claude 06-01 |
1 | ||
| Dynamic workflows · patterns 06-03 |
2 | ||
| LLM ATT&CK Navigator 06-04 |
1 | ||
Averages alone aren't convincing. So take the report with the scariest density of numbers — the one most likely to derail. It passed in a single round.
It's wall-to-wall precise readings: account sample size, observation counts, the before/after change in attack success rate, a correlation coefficient, two sets of scores side by side, three-axis ARiES values, and more. In the past, the moment numbers got dense, a subagent could always find a few where a row got transposed or a decimal got copied wrong during transcription. This time the independent check focused on tracing each number back to the source — and not one was touched.
832 accounts
13,873 observations
33% → 56%
r = 0.28
56.4 vs 46.8
3-axis ARiES