Code w/ Claude 2026 · Spotify first-party case

Coding is no longer the constraint. But what makes Spotify fast isn't Claude

Same Claude — so why such different leverage? Chief Architect Niklas Gustavsson's answer is counterintuitive: the secret isn't the model, it's the platform foundation they built for humans years ago.

weeks-to-months → 3 days

a recent backend Java migration, then vs now Fleet Management · Honk · Backstage

A reversal

The results first — then the turn

Spotify's numbers are striking. By the time of the talk (mid-2026), 96% of engineers code with AI every week, PR frequency is up 60%, and the vast majority of PRs are authored by a developer working alongside an AI agent. The adoption curve steepened sharply after the Opus 4.5 release late last year. As Niklas put it: "We roll out tools internally all the time to make our developers more productive, but we have never seen the rate of adoption that we've seen rolling out AI coding tools."

Typical internal-tool adoption Claude Code (steep after Opus 4.5)

So far it reads like another "we adopted AI and got faster" story. But Niklas turned it on its head: what makes Spotify this fast is not Claude itself. Follow that through and a corollary appears: buying the same tools doesn't equal reproducing Spotify's results — the difference lies outside the tool.

What actually makes them fast is the platform foundation they built years ago — one built entirely for humans. And the standardization done to make humans effective turns out to help agents even more.

Many assume "adopt the AI tool" is the starting line, and leverage follows. Spotify's experience is the opposite: the foundation sets the ceiling. Without that layer of standardization, even a great agent dropped into a fragmented codebase won't run well.

📐

"If Claude has a lot of other code to look at, and that code looks roughly consistent, Claude will do a better job. That's what we're seeing."

They measured it: in more fragmented codebases, agent performance is measurably worse · Niklas Gustavsson

The foundation

So what does it actually look like

The origin predates agents. Years ago Spotify hit a problem: the production codebase was growing seven times faster than the number of engineers. Developers spent more and more time on maintenance — upgrading dependencies, migrating APIs, patching vulnerabilities — and less on features. Migrations were the number one source of developer frustration.

Engineers

1×

Production codebase

growing 7× faster

Headcount barely moved while the codebase multiplied — maintenance had to go to automation

To absorb that, they built two foundations. Both were built for humans — only later did they turn out to be exactly what agents needed too.

Foundation 1

Fleet Management

Instead of asking hundreds of teams to edit components by hand, write small snippets that modify source code (the system is called Fleetshift), run them across thousands of components, and auto-open PRs.

2.5M+ automated PRs · ~half of all PRs since mid-2024

Foundation 2

Backstage

Their open-source internal developer portal. Before it, ~100 internal tools each did their own thing — fragmented and confusing. Backstage consolidated them into a single pane of glass around a component catalog.

~100 tools → one portal

"The fewer technologies we are world-leading in, the faster we go."

One of Spotify's oldest engineering principles — predating AI by years, yet it paved the way for agents

Into leverage

How the foundation became a runway for agents

Start with the ceiling of deterministic scripts. Early Fleet Management ran on "write a script to change code" — great for simple, repeatable tasks, hard for complex ones, because defining transformations by manipulating an abstract syntax tree or writing regexes demands a lot of specialized expertise. The clearest example is the Maven dependency updater: its core job is just to find pom.xml and update Java dependencies, but to handle every corner case it grew to over 20,000 lines. Complex changes were beyond what anyone could write.

maven-bump.transformdeterministic script

// handle every corner case in pom.xml
if (node.type === 'dependencyManagement') {
  if (hasProperty(version)) resolveProp(...)
  else if (isRange(version)) parseRange(...)
  else if (isBOM(parent)) walkImportScope(...)
  // ...also profile / classifier
  // ...exclusion / relativePath / inheritance
}
if (node.type === 'plugin') { /* another pile */ }
// ...every corner hand-written, one by one

20,000+lines of code

honk.prompthand off to agent

"Upgrade the Java dependencies in these repos to the new version. Handle BOMs, profiles, and inheritance chains as needed. Make the build pass, then open a PR."

one sentence → the agent handles the corner cases itself

In February 2025, Spotify began using AI agents inside Fleet Management. After many iterations came Honk. As Niklas put it: it has a silly name and a silly icon, but it turns out to be very useful.

Here's the crux: Honk isn't powerful out of thin air — it runs because it stands on that foundation. These four pillars map directly onto it:

🎯

Fleetshift orchestrates it

Targeting, scheduling, progress tracking stay with Fleet Management; Honk only does the code edits in the middle

✅

Standardization lets it self-verify

Runs Claude (Agent SDK) + their harness + K8s, runs CI builds across OSes; if it fails, it fixes and retries

📐

Consistent code makes it sharper

The measurable gap from section two: with consistent templates everywhere to mirror, the agent does noticeably better

🛡️

Active guardrails self-correct it

Backstage exposed as MCP; golden state + lint flag a wrong pattern the moment it appears and it corrects itself

The results are concrete. By November 2025, Honk had generated more than 1,500 PRs merged into production — and not trivial ones: replacing Java value types with records, migrating data pipelines to a new version of Scio with breaking changes, moving to the new frontend system in Backstage. These migrations saved 60–90% of the time versus by hand. Among all agents, Claude Code is their top performer, applied to about 50 migrations and the majority of merged agent PRs. That 3-day Java migration from the opening is exactly this at work.

1,500+ PRs

generated by Honk and merged to production (by Nov 2025)

60–90%

time saved on complex migrations vs by hand

3 days

most recent backend Java migration, once weeks-to-months

Developers found new uses on their own. Honk lives in Slack, where engineers mention it mid-conversation — a natural source of context — and it flies off, works on the problem, and returns with a PR. Their internal real-time dashboard is called Goose Farm, where each goose is an active Honk session. Honk v2 added multiplayer collaboration, so agents work with multiple developers and teams, not just one person at a terminal.

🪿🪿🪿🪿🪿🪿🪿🪿🪿🪿🪿🪿🪿🪿🪿🪿

Goose Farm · each goose = one running background coding session

“

Firmer guardrails are accelerators, not constraints.Niklas Gustavsson · Chief Architect, Spotify · Code w/ Claude 2026

”

Bottleneck shift

Coding is no longer the bottleneck — humans are

As coding velocity rises, the constraint shifts to human decisions. Spotify has always had more ideas than capacity to build them, but now anyone can open Claude in the client monorepo and prototype a feature idea in minutes instead of days. Even the CEO is building prototypes this way.

The flip side: there are now 60% more PRs to review. Spotify is learning where to apply human judgment — auto-merging what's safe, focusing review where it matters most. As the bottleneck moves from coding to decision-making, the bets they made years ago on Fleet Management, Backstage, and standardization are exactly what caught the handoff.

The old bottleneck

Writing code

prototyping went from days to minutes; anyone can open Claude and try

→

The new bottleneck

Review & decisions

60% more PRs to review; judgment is the new scarce resource

↔

Same so-what, a different lens

Fiona Fung: Running an AI-Native Engineering Org

That piece tackles the bottleneck shift through organizational norms — rewriting six team practices (JIT planning, layered human-agent review, hiring criteria, and more). This one comes at it through the platform foundation — how standardization became the runway for agents. Two views of the same shift: the org side and the infrastructure side.

Primary sources

Coding is no longer the constraint: Scaling devex to teams and agents at Spotify

Niklas Gustavsson (Chief Architect & VP of Engineering, Spotify) @ Code w/ Claude 2026 · official recording

Spotify Engineering blog · the Honk series

Part 1: 1,500+ PRs Later (origin, Fleet Management, the 20K-line script)
Part 2: Context Engineering (Claude Code as top performer, ~50 migrations)
Part 3: Feedback Loops (verifier + Judge loop)
Part 4: Dataset Migrations (standardization as the prerequisite for agents at scale)

Figures follow the official talk (96% / +60% PRs). Faithful to the talk and blogs; key numbers verified item by item.