A privacy-preserving analysis of ~400,000 Claude Code sessions from ~235,000 people (October 2025 to April 2026). People decide what to build, the agent decides how to build it. Whether a session succeeds turns more on how well you understand the problem than on whether you can code. Two numbers frame the study.
Each session is classified into the single work mode that best describes what it is trying to accomplish. About 56% of sessions involve writing, fixing, or testing code directly; 17% operate software; 14% plan or explore; and 13% produce analysis or prose.
A decision-attribution classifier separates every meaningful decision in a session into planning (what to do, which approach, what counts as done) and execution (which files to change, what code to write, which commands to run), then attributes each to the user or to Claude.
Median: users make about 70% of planning decisions but only about 20% of execution decisions. The "how" is largely delegated to the agent.
A typical session runs about four turns. Each prompt sets off a chain of around 10 Claude actions on average (sometimes over 100), producing about 2,400 words per turn. When the user keeps execution control (>80% of execution decisions), Claude takes about 8 actions per turn; when Claude takes over planning (>80%), it takes about 16.
People decide what to build, and the agent decides how to build it.— Anthropic, "Agentic coding and persistent returns to expertise" "
A classifier rates each user's apparent expertise at the task on a five-point scale, from novice to expert. It looks at three signals: how precisely the user frames their directions, what they ask Claude to verify, and whether the user tends to correct Claude or Claude tends to correct the user. This expertise is task-specific, distinct from job title or general ability.
A senior engineer asking their first Rust question is a beginner at Rust.
An accountant who has never used Python, but tells Claude exactly which reconciliation rules a script must enforce and catches the edge case it mishandles at month-end close, is an expert at that task.
Controlling for work mode, task value, month, occupation, and model family, the trend remains significant: about +9% actions and +13% output per expertise level (p < 0.001). The gap appears within every kind of work and every band of task value.
Success is measured as judged success (classifier reads the full transcript) and the stricter verified success (judged successful plus at least one hard signal: matching git commits, passing tests, or explicit user affirmation). Across all measures, more expertise means more success. Most of the gain comes from novice to intermediate; the slope flattens between intermediate and expert.
Abandoned rate (judged failed, zero lines written): novice 19% vs everyone else 5-7%. The least experienced users give up at several times the rate of everyone else when struggling.
The study infers occupation from transcripts using the SOC taxonomy and explicitly instructs the classifier not to treat coding as evidence of a coding profession. A lawyer who builds a script to flag missing clauses is mapped to Legal Occupations.
The composition of work changed substantially over the seven months. The share of sessions spent fixing broken code fell by nearly half; the freed-up share went to operating software, writing, and data analysis.
Tasks also grew more valuable. Approximated by what the work would cost on a freelance marketplace, the average session value rose about 27% (the Key Findings summarize this as "about 25% on average"); building +43%, operating +34%, fixing +32%. The study notes these are coarse estimates meant for relative comparison over time.
The study frames the overall picture as agentic coding amplifying some knowledge while substituting for others. The gains come mostly from competence, not mastery: "proficiency in a domain is enough to use the tool almost as effectively as those with deep mastery."
Stated limits: cannot measure real-world outcomes (whether code is actually used); excludes non-interactive usage (a substantial share); all classifications rely on a model reading the transcript (appendix shows alignment with independent telemetry).
anthropic.com · 2026-06-16 · Economic Research (Zoe Hitzig, Maxim Massenkoff, Eva Lyubich, Ryan Heller, Peter McCrory). All claims, numbers, definitions, and quotes here come from this source. All figures are Anthropic's own and cannot be independently verified. Classifiers use Claude Sonnet 4.6; data excludes third-party IDEs, SDKs, and non-interactive claude -p usage. Figure 5's "Beginner" and "Advanced" rows are interpolated from the reported +9%/+13% per-level regression coefficients (the study reports specific numbers for novice, intermediate+, and expert only).