How Claude Code Works in Large Codebases

Core Argument

The Harness Matters as Much as the Model

One of the most common misconceptions about Claude Code is that its capabilities are solely defined by the model used. Teams study benchmarks, compare SWE-Bench scores, but in multi-million-line monorepos — in practice, the ecosystem built around the model — the harness — determines how Claude Code performs more than the model alone. The model is the engine; the harness is the entire car — chassis, drivetrain, suspension, tires. Without the harness, the model's capabilities have nowhere to go; with it, every generational improvement in the model is amplified.

This article breaks down the deployment patterns Anthropic has observed: search strategies, extension architecture, navigable codebases, configuration half-life, and organizational rollout paths.

§I

Agentic Search vs RAG

Claude Code navigates a codebase the way a software engineer would: it traverses the file system, reads files, uses grep to find exactly what it needs, and follows references across the codebase. Files get renamed, functions get deleted — agentic search sees the real state at runtime.

RAG-powered AI coding tools work by embedding the entire codebase and retrieving relevant chunks at query time. At large scale, those systems can fail because embedding pipelines can't keep up with active engineering teams. But the approach has a tradeoff: it works best when Claude has enough starting context to know where to look. Teams that invest in codebase setup see better results.

RAG — Pre-built Index

Agentic — Live Traversal

§II

The Harness — 7 Extension Layers

The harness is built from five extension points — CLAUDE.md files, hooks, skills, plugins, and MCP servers — each serving a different function. The order in which teams build them matters, as each layer builds on what came before.

1. CLAUDE.md

Context files that Claude reads automatically at the start of every session: root file for the big picture, subdirectory files for local conventions. Only broadly applicable content — not a README, not a wiki, but behavioral guidance for the agent.

# acme-monorepo
TypeScript everywhere. Rust for services/edge-*.

## Test commands
- root: pnpm test --filter <pkg>
- per-service: see subdirectory CLAUDE.md

## Critical gotchas
- Never import from internal/ outside parent
- Migrations in db/migrations/ — no inline SQL

2. Hooks

Most teams think of hooks as scripts that prevent Claude from doing something wrong, but their more valuable use is continuous improvement. A stop hook proposes CLAUDE.md updates; a start hook dynamically loads context. Lifecycle-event-driven self-evolution.

#!/bin/bash
# .claude/hooks/on-stop
claude reflect \
  --propose-updates-to CLAUDE.md \
  --while-context-fresh

3. Skills

Not all expertise needs to be present in every session. Skills solve this through progressive disclosure — bound by path, activated only when working in the relevant directory. Avoids context bloat.

# .claude/skills/security-review.yaml
name: security-review
trigger: "auditing code for vulns"
scope: ["services/**"]

4. Plugins

Good setups can stay tribal. A plugin bundles skills, hooks, and MCP configurations into a single installable package. New engineers are productive on day 1 — no assembly required.

{
  "name": "retail-analytics",
  "skills": ["pull-perf-data"],
  "hooks": ["log-query-shape"],
  "mcp": ["analytics-warehouse"]
}

5. MCP Servers

The most sophisticated teams built MCP servers exposing structured search as a tool Claude can call directly. Claude interacts with MCP servers through tool calling — search engines, databases, CI systems can all be exposed as tools.

{
  "servers": {
    "code-search": {
      "cmd": "internal-search-mcp",
      "tool": "structured_search"
    }
  }
}

6. LSP Integration

Symbol-level precision with go-to-def and find-refs. One enterprise software company deployed LSP integrations org-wide before their Claude Code rollout, specifically to make C and C++ navigation reliable at scale.

{
  "plugins": ["code-intelligence"],
  "languageServers": {
    "cpp": "clangd",
    "rust": "rust-analyzer"
  }
}

7. Subagents

A subagent is an isolated Claude instance with its own context window that takes a task, does the work, and returns only the final result to the parent. Exploration and editing are separated — the subagent scans broadly, the main agent edits precisely with the full picture.

> spawn read-only subagent
> task: map auth subsystem → docs/auth-map.md
← 247 refs across 18 files
  main agent edits with full picture

§III · Pattern 1

Making the Codebase Navigable

Claude's ability to help in a large codebase is bounded by its ability to find the right context. Too much context loaded into every session degrades performance, while too little context leaves Claude to navigate blind. The most effective deployments invest upfront in making the codebase legible to Claude.

This isn't about making Claude smarter—it's about making the codebase easier to explore. A few patterns appear consistently:

acme-monorepo/

├── CLAUDE.md ⊕ root: pointers + gotchas only

├── .claude/

│ └── settings.json ⊕ permissions.deny rules

├── apps/

│ ├── web/

│ │ ├── CLAUDE.md ⊕ per-subdir test/lint

│ │ └── src/

│ └── mobile/

├── services/

│ ├── payments/

│ │ ├── CLAUDE.md ⊕ init here, not root

│ │ └── ...

│ └── auth/

├── codebase-map.md ⊕ directory index

└── (LSP: clangd running) ⊕ symbol search

⊕ Layered CLAUDE.md — root holds global pointers + gotchas only

⊕ Init from subdirectory — cd services/payments && claude

⊕ Per-subdir scoped test/lint commands

⊕ .ignore to exclude noise — build artifacts / node_modules

⊕ Build a codebase map — directory-level index file

⊕ Run LSP server — symbol-level search

Edge case: codebases with hundreds of thousands of folders and millions of files, or legacy systems on non-git version control—even the hierarchical CLAUDE.md approach breaks down. Anthropic plans to address those challenges in future installments of this series.

§IV · Pattern 2

Configuration Has a Half-Life

As models evolve, instructions written for your current model can work against a future one. Models iterate, but configuration doesn't automatically evolve with them—constraints and compensations written for one generation may become obstacles for the next.

Two typical failure modes:

Old rules actively constrain new capabilities—a CLAUDE.md rule that tells Claude to break every refactor into single-file changes may have helped an earlier model stay on track, but prevents a newer one from making coordinated cross-file edits it handles well
Compensation tooling becomes overhead—skills and hooks built to compensate for specific model limitations become dead weight once those limitations no longer exist

The chart below shows how two real configuration items decay in value across model generations:

The Perforce hook case: Early on, Claude Code didn't understand Perforce workflows, so a team built a hook that intercepted file writes to enforce p4 edit—a necessary compensation at the time. Once Claude Code added native Perforce mode, the hook became redundant. If the team didn't review their config, it would keep running: adding complexity, no longer adding value.

Recommended cadence Teams should expect to do a meaningful configuration review every three to six months, but it's also worth doing one whenever performance feels like it's plateaued after major model releases.

§V · Pattern 3

Infrastructure Before Access

Technical configuration alone doesn't drive adoption. Patterns 1 and 2 solve "can a single task complete" and "will the config still work in 6 months." But without organizational investment, good setups stay tribal—one person figures it out, everyone else doesn't know.

The rollouts that spread fastest had a dedicated infrastructure investment before broad access. A small team—sometimes even just one person—wired up the tooling so Claude already fit developer workflows when they first touched it. Plugins installed, MCP connected, CLAUDE.md conventions in place. The first experience is productive rather than frustrating, and adoption spreads naturally.

At one company, a couple of engineers built a suite of plugins and MCPs that were available on day one.
At another, an entire team focused on managing AI coding tools had the infrastructure in place before the rollout began.

This contrasts sharply with "open access first, let people figure it out":

Who owns this?

The teams doing this work today tend to sit under developer experience or developer productivity. An emerging role in several organizations is an agent manager: a hybrid PM/engineer function dedicated to managing the Claude Code ecosystem.

For organizations without a dedicated team, the minimum viable version is a DRI: one person with ownership over the Claude Code configuration, the authority to make calls on settings, permissions policy, the plugin marketplace, and CLAUDE.md conventions, and the responsibility to keep them current.

Governance questions (especially in regulated industries):

Who controls which skills and plugins are available
How to prevent thousands of engineers from independently rebuilding the same thing
How to ensure AI-generated code goes through the same review process as human-generated code

Suggested starting point: begin with a defined set of approved skills, required code review processes, and limited initial access—and expand as confidence builds. The smoothest deployments come from organizations that establish cross-functional working groups early, bringing engineering, information security, and governance representatives together to define requirements and build a rollout roadmap.

Boundaries

Scope

Designed For

Engineers + Git + standard directory structures
Monorepo or multi-repo
C/C++/C#/Java/PHP exceed expectations
Conventional software engineering environments

Requires Extra Work

Game engine binary assets
Non-standard VCS (Perforce, etc.)
Non-engineer contributors
Very large files (auto-generated code)

Future Coverage

Million-file scale
Non-Git legacy systems
Fully autonomous operation
Cross-organization collaboration

Closing

Key Quote

"The harness matters as much as the model. The smoothest rollouts had a dedicated infrastructure investment before broad access."

— Anthropic Applied AI Team, 2026-05-14

Sources

References

Primary Anthropic Blog — How Claude Code works in large codebases
Slide Deck This site PPTX (13 slides)
Method Fact-check: four passes, all clear
Credits Special thanks to Alon Krifcher, Charmaine Lee, Chris Concannon, Harsh Patel, Henrique Savelli, Jason Schwartz, Jonah Dueck and Kirby Kohlmorgen from Anthropic's Applied AI team, and to Amit Navindgi at Zoox

How Claude Codenot just "how it works" —best practicesin large codebases