If one person now does the work of three, the intuitive move is to cut headcount. Anthropic does the opposite: the team stays the same size, and what gets reshaped is each person's role, plus the whole decision-and-verification layer above the team.
When writing code is no longer a scarce resource, the intuitive conclusion is that if one person does the work of three, the team should shrink. Anthropic does the opposite. The team stays the same size; what gets reshaped is each person's role, plus the entire decision-and-verification layer above the team.
The person describing this is Katelyn Lesse. She led retail engineering at Stripe with 70-plus engineers, left Stripe in mid-2025 to join Anthropic, and now runs engineering for the Claude Developer Platform. She is also the person behind Claude Managed Agents, the hosted service. Her vantage point is unusual: she brings the experience of building engineering orgs at a mature payments company, and she is running an AI-native team inside a frontier lab.
This is the third facet of "the AI-native engineering org." The first two covered the rewrite of how the work is done, and the platform foundation that makes Spotify fast. This piece is about how the org and its roles get reshaped, and the platform architecture that supports it. All three share the same base condition: code is essentially all AI-generated, and the bottleneck has moved from writing code to reviewing it and making decisions. The real addition here is what a team's shape, headcount, and responsibilities should become once that base condition holds.
A line Katelyn keeps coming back to: much about software development hasn't changed, and doesn't need to. A team still needs enough people to build, operate, and maintain systems over time. The one thing that changed is that AI gives everyone far more leverage.
On team structure, Anthropic still uses the traditional two-pizza cross-functional team: five to eight engineers, plus an engineering manager, a PM, and a designer. At her Sydney keynote she put it more plainly: teams are still around seven or eight people, a number that isn't the result of any scientific reasoning but of culture and history, and it simply happens to be the size that works.
Size holds; what gets reshaped is what those people actually do. The old model was one team lead, one PM, and four to six engineers who only implemented. In the new shape, nearly every engineer has to be able to step into the tech-lead role when needed. On stage in Sydney, Katelyn and head of product Angela Jiang described the new shape more precisely: one Team Lead, plus a roughly even split between "System Designers" and "Implementers."
Same people, different order of magnitude in output. A team of eight that used to run one or two projects at a time can now run four or five in parallel. The team still owns and maintains the same systems; AI multiplied how much work each person can execute concurrently.
Once capacity is amplified, pressure piles up downstream, onto the stages AI can't take over yet. The next few sections are what each of those stages gets squeezed into.
The first stage to become a bottleneck once capacity is amplified is "deciding what to build." This runs against what many companies do.
Plenty of teams cut PMs after AI took hold. OpenAI's engineer-to-PM ratio is thirty to one; companies like Telnyx and Portkey run with almost no PMs at all. Anthropic's read is the opposite: AI did not reduce the need for PMs, and the technically fluent TPM in particular matters more than before.
The logic is direct. Once engineers ship faster, the bottleneck moves from "can we build it" to "should we build it, and are we building the right thing." The challenge is no longer whether the team is capable of producing something, but whether it is building the right thing. In her view, amplified engineering capacity needs someone to make sure it is focused on the highest-impact problems, and that is exactly the value of product management.
This read is backed by an independent set of hard numbers. Anthropic's Head of Growth, Amol Avasare, has said publicly that after rolling out Claude Code internally, engineering teams run at two to three times their effective headcount, and a five-person team ships like fifteen to twenty. Those numbers also explain why the company is actively posting roles to add PMs rather than engineers.
Hiring PMs with engineering backgrounds at pay that rivals senior engineers is, in effect, a statement that product decision-making is what's genuinely scarce and genuinely valuable right now. Role boundaries are also shifting toward "PMs need to be more technical." Angela made it explicit on stage in Sydney: PMs now have to be more technical, not engineers exactly, but technical enough to ship simple features themselves.
The constraint moved downstream, from the keyboard to the whiteboard.On Anthropic's situation, per Amol Avasare (Ranzware)”
The second stage to become a bottleneck once capacity is amplified is quality assurance. Here Anthropic's choice is to have no dedicated QA engineer; testing is a shared responsibility across the team.
Because AI makes code generation faster, how tests are written matters more. The team holds to the traditional testing pyramid: most are unit tests, then integration tests, and the fewest are end-to-end tests.
Capping the number of end-to-end tests is deliberate, so CI doesn't become a bottleneck and slow tests don't drag development down.
The other heavy piece is evaluating the AI models and products themselves, i.e. AI Evals. The team invests a lot in building and maintaining evaluation systems to measure quality and make sure new models, new products, and platform updates meet the bar before they reach customers. This work belongs to no single team; engineers and PMs are both deeply involved, and everyone is responsible for meeting a high standard before shipping.
Katelyn is candid that roles inside the team have become more fluid, with no clear division of responsibility; they are learning as they go and don't claim to have found the perfect formula.
Asked whether her team, like Claude Code lead Boris Cherny, generates essentially all of its code with AI, she says yes. But she immediately adds the crucial qualifier: code being all AI-generated does not mean AI is building the software on its own.
In her account, engineers play the most important role in system design and architecture decisions. AI is an excellent collaborator, but how a large, complex system should be built is decided by engineers. Once the design is clear, implementation is largely handed to AI agents. Her operating tip is not to switch back and forth between "the agent generates code" and "a human writes code," but to keep steering the agent until it produces work that meets the bar.
Because of this, she considers reviewing AI-generated work one of the most valuable engineering skills. For "what humans are irreplaceable for," she laid out a full framework in an earlier blog post, grouping it into three things.
There is no loss function for "what should this company care about."Katelyn Lesse, blog (2026-02)”
The earlier sections cover how the human side is arranged; this one covers how humans should work with agents. Katelyn's top piece of advice: define the outcome you want, not just assign a task.
Concretely, rather than saying "build me a dashboard," spell out what the finished thing should accomplish, what success looks like, and any important requirements or examples. Her team turned this into a product capability. Claude Managed Agents has a feature called Outcomes that makes "define the result, judge it automatically" a built-in loop.
This is the productization of a practice she wrote about earlier. She calls it self-verifying loops: don't expect an LLM to one-shot a clean answer to a hard problem; have the coding agent make a plan, write the code, run the tests, check the output, and keep iterating until everything passes. Her plain analogy: it's why a person reviews a document before sharing it. Set the bar for done, give the agent tools to verify against it, and don't let it stop until it gets there.
She points to two other practices from the same root: have agents check each other (one writes code, another reviews with a different lens; during an incident, multiple agents investigate in parallel with different hypotheses, the way on-call engineers catch what each other missed), and give agents memory and context (a senior engineer outperforms a new hire of equal raw ability thanks to accumulated organizational knowledge like "we tried this six months ago and it failed for a subtle reason"; agents need the same).
Roles can be arranged this way only if a platform architecture supports it. When Anthropic shipped self-hosted sandboxes and MCP tunnels at Code with Claude London, Katelyn laid out the principle clearly.
She names the two biggest blockers for enterprises building agents: code execution has to run on their own infrastructure, and tool access has to stay behind their firewall, or security won't sign off. Anthropic's answer is to split the agent into a brain and hands.
Self-hosted sandboxes let Claude Managed Agents run code on infrastructure the enterprise fully controls, using its own runtime image, network rules, and security tooling; MCP tunnels let agents reach internal MCP services behind the firewall without opening a single inbound port. In the same post, she raised the architecture into a statement aimed at every company.
“Every company should be standing up an internal agent platform right now, or risk falling behind. And there's no longer a reason to build that platform from scratch.”
Put the sections together and Anthropic's approach traces a shape opposite to "AI boosts output, so cut headcount": the team stays the same size, and human effort moves from writing code to architecture judgment, reviewing AI output, making product decisions, and maintaining customer relationships. Agents take on the rest, and are expected to run in self-verifying loops, check each other, and work with memory.
A point she keeps making: the path there isn't waiting for smarter models, but building the agent setup right now, while holding on to the people with the best judgment, taste, and relationships. Once AI amplifies capacity, those are exactly what remain scarce.
Treat agents like teammates, not tools. Give them memory, give them domain expertise, and make them check each other's work. Then find the humans with the best judgment, taste, and relationships. Hold on to them.Katelyn Lesse, blog”
All three pieces share the same premise: code is essentially all AI-generated, and the bottleneck shifts to review and decisions. They differ in which facet they cut into — this piece covers org and role reshaping, the other two cover how the work is done and the infrastructure.
Gap note: the paywalled sections "engineers becoming tech-agnostic" and "how onboarding has changed" have no first-party Katelyn source, so this article omits them rather than fill them with someone else's words.