Anthropic Platform Engineering · Katelyn Lesse · Deep read

When code is no longer scarce,
the org reshapes, not shrinks

If one person now does the work of three, the intuitive move is to cut headcount. Anthropic does the opposite: the team stays the same size, and what gets reshaped is each person's role, plus the whole decision-and-verification layer above the team.

Team size
7–8, unchanged
The old “1 TL + 1 PM + 4-6 ICs” is reshaped into “1 TL + a roughly even split of System Designers and Implementers.”
Same people, output
1-2 → 4-5 in parallel
The number of concurrent projects multiplied; pressure piled up downstream.
Katelyn Lesse Head of Platform Engineering, Anthropic AI-Native Eng Org · Part 3

When writing code is no longer a scarce resource, the intuitive conclusion is that if one person does the work of three, the team should shrink. Anthropic does the opposite. The team stays the same size; what gets reshaped is each person's role, plus the entire decision-and-verification layer above the team.

The person describing this is Katelyn Lesse. She led retail engineering at Stripe with 70-plus engineers, left Stripe in mid-2025 to join Anthropic, and now runs engineering for the Claude Developer Platform. She is also the person behind Claude Managed Agents, the hosted service. Her vantage point is unusual: she brings the experience of building engineering orgs at a mature payments company, and she is running an AI-native team inside a frontier lab.

This is the third facet of "the AI-native engineering org." The first two covered the rewrite of how the work is done, and the platform foundation that makes Spotify fast. This piece is about how the org and its roles get reshaped, and the platform architecture that supports it. All three share the same base condition: code is essentially all AI-generated, and the bottleneck has moved from writing code to reviewing it and making decisions. The real addition here is what a team's shape, headcount, and responsibilities should become once that base condition holds.

On sources · The core source is Gregor Ojstersek's Engineering Leadership newsletter interview with Katelyn Lesse (2026-07-01, paywalled). A few sections behind the paywall could not be retrieved; this article fills those gaps with Katelyn's own public material (her blog, a Sydney keynote writeup, LinkedIn, podcasts), with sources and dates noted. Her blog was published in 2026-02 and is used for her consistent framing; wherever specific figures are involved, the more recent public statements take precedence.
01 · Reshape

Same headcount, different composition

A line Katelyn keeps coming back to: much about software development hasn't changed, and doesn't need to. A team still needs enough people to build, operate, and maintain systems over time. The one thing that changed is that AI gives everyone far more leverage.

On team structure, Anthropic still uses the traditional two-pizza cross-functional team: five to eight engineers, plus an engineering manager, a PM, and a designer. At her Sydney keynote she put it more plainly: teams are still around seven or eight people, a number that isn't the result of any scientific reasoning but of culture and history, and it simply happens to be the size that works.

Size holds; what gets reshaped is what those people actually do. The old model was one team lead, one PM, and four to six engineers who only implemented. In the new shape, nearly every engineer has to be able to step into the tech-lead role when needed. On stage in Sydney, Katelyn and head of product Angela Jiang described the new shape more precisely: one Team Lead, plus a roughly even split between "System Designers" and "Implementers."

Before
One person sets direction, the rest implement
TL1 Team Lead
PM1 Product Manager
IC4–6 implement-only engineers
Code was expensive, so most people were assigned to write it.
Now
An even split: who defines the system, who implements it
TL1 Team Lead
SD~half System Designers
IM~half Implementers
Code got cheap; the center of gravity shifts to defining systems and architecture.

Same people, different order of magnitude in output. A team of eight that used to run one or two projects at a time can now run four or five in parallel. The team still owns and maintains the same systems; AI multiplied how much work each person can execute concurrently.

Before
1–2 projects at a time
Now (same 7-8 people)
4–5 projects at a time
What this section means

Once capacity is amplified, pressure piles up downstream, onto the stages AI can't take over yet. The next few sections are what each of those stages gets squeezed into.

02 · Bottleneck shift

From keyboard to whiteboard, so more PMs are needed

The first stage to become a bottleneck once capacity is amplified is "deciding what to build." This runs against what many companies do.

Plenty of teams cut PMs after AI took hold. OpenAI's engineer-to-PM ratio is thirty to one; companies like Telnyx and Portkey run with almost no PMs at all. Anthropic's read is the opposite: AI did not reduce the need for PMs, and the technically fluent TPM in particular matters more than before.

The logic is direct. Once engineers ship faster, the bottleneck moves from "can we build it" to "should we build it, and are we building the right thing." The challenge is no longer whether the team is capable of producing something, but whether it is building the right thing. In her view, amplified engineering capacity needs someone to make sure it is focused on the highest-impact problems, and that is exactly the value of product management.

OpenAI (for contrast)
30 : 1
Engineers to PMs. Telnyx and Portkey run with almost no PMs. The direction is fewer PMs.
Anthropic
Wants more PMs
Still 1 PM per team today, but she expects the ratio to change soon; a dedicated Prod Ops role helps teams decide faster.

This read is backed by an independent set of hard numbers. Anthropic's Head of Growth, Amol Avasare, has said publicly that after rolling out Claude Code internally, engineering teams run at two to three times their effective headcount, and a five-person team ships like fifteen to twenty. Those numbers also explain why the company is actively posting roles to add PMs rather than engineers.

2–3×
Effective engineering capacity
After Claude Code, a 5-person team ships like 15–20
>80%
Merged production code
In some periods since the research preview, over 80% of merged code came from the AI coding system
$305–460K
Claude Code PM salary
Requires both PM discipline and engineering fluency, on par with senior engineering pay
On the figures · The 2-3x capacity, $305K–$460K, and >80% merged code all come from a third-party relay of Amol Avasare's public statements (Ranzware, 2026-06-27); the 80% figure is consistent with Anthropic's other public statements of "70% to 90% of code AI-generated."

Hiring PMs with engineering backgrounds at pay that rivals senior engineers is, in effect, a statement that product decision-making is what's genuinely scarce and genuinely valuable right now. Role boundaries are also shifting toward "PMs need to be more technical." Angela made it explicit on stage in Sydney: PMs now have to be more technical, not engineers exactly, but technical enough to ship simple features themselves.

The constraint moved downstream, from the keyboard to the whiteboard.On Anthropic's situation, per Amol Avasare (Ranzware)
03 · Quality

No dedicated QA; testing is everyone's job

The second stage to become a bottleneck once capacity is amplified is quality assurance. Here Anthropic's choice is to have no dedicated QA engineer; testing is a shared responsibility across the team.

E2E Fewest · capped on purpose Integration Fewer Unit tests Most · the base

Why not looser, but more careful

Because AI makes code generation faster, how tests are written matters more. The team holds to the traditional testing pyramid: most are unit tests, then integration tests, and the fewest are end-to-end tests.

Capping the number of end-to-end tests is deliberate, so CI doesn't become a bottleneck and slow tests don't drag development down.

The other heavy piece is evaluating the AI models and products themselves, i.e. AI Evals. The team invests a lot in building and maintaining evaluation systems to measure quality and make sure new models, new products, and platform updates meet the bar before they reach customers. This work belongs to no single team; engineers and PMs are both deeply involved, and everyone is responsible for meeting a high standard before shipping.

Katelyn is candid that roles inside the team have become more fluid, with no clear division of responsibility; they are learning as they go and don't claim to have found the perfect formula.

04 · Judgment stays human

Code is all AI-generated, but the architecture is human

Asked whether her team, like Claude Code lead Boris Cherny, generates essentially all of its code with AI, she says yes. But she immediately adds the crucial qualifier: code being all AI-generated does not mean AI is building the software on its own.

In her account, engineers play the most important role in system design and architecture decisions. AI is an excellent collaborator, but how a large, complex system should be built is decided by engineers. Once the design is clear, implementation is largely handed to AI agents. Her operating tip is not to switch back and forth between "the agent generates code" and "a human writes code," but to keep steering the agent until it produces work that meets the bar.

The effective use of AI is not about asking an LLM to generate code and accepting the first answer. You should continuously steer the agent, provide feedback, and let it test and improve its own work until it meets the required standard.Katelyn Lesse (Engineering Leadership interview)

Because of this, she considers reviewing AI-generated work one of the most valuable engineering skills. For "what humans are irreplaceable for," she laid out a full framework in an earlier blog post, grouping it into three things.

Motivation
The drive to pick problems
There is no loss function for what a company should care about. The best product decisions aren't the highest-expected-value play, but bets on a future not yet in the data. You can't optimize for the objective, because the objective is the thing you're choosing.
Taste
Taste
Not just recognizing what's good, but originating a standard of good that doesn't yet exist. When every company consults the same models, they converge on the same designs; what differentiates is a human saying "I believe this is better."
Trust
Trust that drives growth
Trust is built on having something at stake and keeping commitments. An agent has nothing at stake, so it can't. The companies with the deepest trust get market signal no one else has.
On timing · The motivation / taste / trust framework comes from Katelyn's own blog post, "The human + agent software team wins" (2026-02-16), about four and a half months before the July interview. It explains why responsibility and judgment stay on the human side, the same thread as the July interview's "reviewing AI output is the most valuable skill" and "architecture decisions are the engineer's job."
There is no loss function for "what should this company care about."Katelyn Lesse, blog (2026-02)
05 · Working with agents

Give the agent an outcome, not a task

The earlier sections cover how the human side is arranged; this one covers how humans should work with agents. Katelyn's top piece of advice: define the outcome you want, not just assign a task.

Concretely, rather than saying "build me a dashboard," spell out what the finished thing should accomplish, what success looks like, and any important requirements or examples. Her team turned this into a product capability. Claude Managed Agents has a feature called Outcomes that makes "define the result, judge it automatically" a built-in loop.

Human
Describe the desired result
What it should accomplish, what success looks like
Agent A
Coding agent works
Produces a version
Agent B
Judges against the bar
Checks against the desired result
Deliver
Only if it meets the bar
Final result
↻ If it falls short: the coding agent iterates on its own and goes back to the judge, until it meets the bar

This is the productization of a practice she wrote about earlier. She calls it self-verifying loops: don't expect an LLM to one-shot a clean answer to a hard problem; have the coding agent make a plan, write the code, run the tests, check the output, and keep iterating until everything passes. Her plain analogy: it's why a person reviews a document before sharing it. Set the bar for done, give the agent tools to verify against it, and don't let it stop until it gets there.

She points to two other practices from the same root: have agents check each other (one writes code, another reviews with a different lens; during an incident, multiple agents investigate in parallel with different hypotheses, the way on-call engineers catch what each other missed), and give agents memory and context (a senior engineer outperforms a new hire of equal raw ability thanks to accumulated organizational knowledge like "we tried this six months ago and it failed for a subtle reason"; agents need the same).

Good prompts are less about telling the AI what to do step by step and more about clearly describing what success looks like.Katelyn Lesse
06 · Architecture

The architecture behind it: split the brain from the hands

Roles can be arranged this way only if a platform architecture supports it. When Anthropic shipped self-hosted sandboxes and MCP tunnels at Code with Claude London, Katelyn laid out the principle clearly.

She names the two biggest blockers for enterprises building agents: code execution has to run on their own infrastructure, and tool access has to stay behind their firewall, or security won't sign off. Anthropic's answer is to split the agent into a brain and hands.

Anti-pattern · the agent-in-a-box trap
harness, session, sandbox all in one container
harness session sandbox one container
Doesn't scale; the security boundary ends up in the wrong place.
The fix · brain / hands split
Brain and hands decoupled; hands live in your VPC
Brain Claude + harness Your VPC Hands sandbox+tools
The hands can live anywhere, including inside the enterprise's own VPC.
The brain (Claude + harness) is decoupled from the hands (sandboxes + tools), so the hands can live anywhere, including inside your VPC. Don't fall into the agent-in-a-box trap where harness, session, and sandbox all share a container — it doesn't scale, and the security boundary ends up in the wrong place.Katelyn Lesse (LinkedIn, 2026-05)

Self-hosted sandboxes let Claude Managed Agents run code on infrastructure the enterprise fully controls, using its own runtime image, network rules, and security tooling; MCP tunnels let agents reach internal MCP services behind the firewall without opening a single inbound port. In the same post, she raised the architecture into a statement aimed at every company.

Her words

“Every company should be standing up an internal agent platform right now, or risk falling behind. And there's no longer a reason to build that platform from scratch.”

07 · Her take

Her own conclusion

Put the sections together and Anthropic's approach traces a shape opposite to "AI boosts output, so cut headcount": the team stays the same size, and human effort moves from writing code to architecture judgment, reviewing AI output, making product decisions, and maintaining customer relationships. Agents take on the rest, and are expected to run in self-verifying loops, check each other, and work with memory.

A point she keeps making: the path there isn't waiting for smarter models, but building the agent setup right now, while holding on to the people with the best judgment, taste, and relationships. Once AI amplifies capacity, those are exactly what remain scarce.

Treat agents like teammates, not tools. Give them memory, give them domain expertise, and make them check each other's work. Then find the humans with the best judgment, taste, and relationships. Hold on to them.Katelyn Lesse, blog
Same series

Two other facets of the AI-native engineering org

All three pieces share the same premise: code is essentially all AI-generated, and the bottleneck shifts to review and decisions. They differ in which facet they cut into — this piece covers org and role reshaping, the other two cover how the work is done and the infrastructure.

References

Sources

Gap note: the paywalled sections "engineers becoming tech-agnostic" and "how onboarding has changed" have no first-party Katelyn source, so this article omits them rather than fill them with someone else's words.