KIRO · IDE + CLI · 2026 · THE COMPLETE GUIDEBased on kiro.dev official docs · Distinguishing the two product lines

The model is the fulcrum;
the harness is the lever

Kiro writes the agent's main loop for you; the ring around that loop is yours to configure. With the same model, how well you configure that ring decides whether it's a tool that edits a few lines or a developer that can carry a large system. But one thing has to be said up front: Kiro is two product lines, and the IDE and CLI are not feature-aligned. Every capability in this guide is labeled with which line it belongs to.

TopicUsing Kiro's harness well Product linesKiro IDE + Kiro CLI Sourcekiro.dev official docs, 2026-06 Labels[Official] / inferred / version-dependent, marked inline
Fulcrum · model (given) harness (you configure) output ceiling ↑ Lever arm = Floors 0–4, lengthening floor by floor
GenAI Playbook · Porting "Using Claude Code's harness well" to Kiro, for teams who work mainly in Kiro · Distinguishing IDE / CLI throughout
First, get one thing straight

Same model;
the environment decides its ceiling

Treat the model like a developer who just joined the team and onboard it properly, and the things that have been sitting on the backlog can start moving again; meanwhile the model's own capability really is the ceiling. Put the two together: the harness is the adjustable lever, the model is the fixed fulcrum. You can't swap the fulcrum; the only thing you can move is the lever. Kiro makes "configuring the lever" mature and ready to use. Beyond configurable primitives like steering, hooks, and skills, it ships a strong default paradigm of its own: write a spec first, have a human approve it, then let the agent implement against it.

4× fewer
Kiro officially singles out Opus 4.8: four times less likely than the previous generation to let a defect through in generated code, and listed as "highest reliability"
1M ↔ 200K
The context window is model-dependent, not a fixed value; the model you pick determines this session's window
2 lines
IDE and CLI share some configuration, but are not feature-aligned; every capability is tagged IDE CLI SHARED

A discipline that runs through the whole guide: a harness decays over time. Once the model improves, the scaffolding you built earlier can become dead weight. So for each layer below, we note when it should be retired — picked up again in the closing section. One more reminder that holds specifically for Kiro: it iterates extremely fast, and behavior changes between versions. Wherever a specific behavior is noted (where a button lives, what a command's default is), confirm it against your current version number before you rely on it.

First, define the scope

This guide only covers "configuring Kiro"

What Kiro gives you is an already-implemented agent main loop; you don't need to write the loop yourself. This guide discusses one thing only: how to configure that ready-made framework. Keep three categories apart: ① what the model gives you (can't change, Floor 0), ② what you configure (the body of this guide), and ③ what you build from scratch (out of scope here).

▸ ② Configuring Kiro (the body)

Model selection and billing, context management, the build order for knowledge & tools, orchestrating with Spec and Subagents, and breaking verification out on its own. All of it is configure, don't build inside Kiro.

Plus ① model capability: this is given, can't be changed, can only be worked around, and counts as the foundation of the harness (Floor 0).

▸ Out of scope: building an agent from scratch

Building an agent from scratch with an API or SDK is a different game: writing your own main loop, designing your own prompt caching, implementing your own MCP server.

Where such topics come up, this guide only touches on them, marks them "out of scope," and moves on.

The full blueprint · Click any floor to jump

Five floors, built bottom-up

0 · Foundation: model capabilityGiven · accept, can't change 1 · Context: the only resource you truly manageclear only clears the display 2 · Knowledge & tools: a 7-layer blueprintSteering → MCP 3 · Orchestration: who holds the planSpec / Subagent 4 · Verification: generation ≠ evaluationindependent reviewer In practice · large codebase + Closing · build then cut Build order · bottom-up
Each floor rests on the one below: get the foundation straight before managing context, manage context well before adding knowledge & tools, and only then talk about orchestration and verification. The left rail follows you the whole way, lighting up whichever floor you're reading.
0Sheet 00 · Foundation① Model capability · given, not built by you

Floor 0: the foundation you can't get around

This floor isn't built — it's given — but you have to understand it first, because it determines how lean the harness can be. SHARED This floor is essentially the same for IDE and CLI; the only difference is where you switch settings.

0.1The context window and context rot

Kiro's context window is model-dependent, not a fixed value. The model you pick determines how large a window this session can use.

ModelContext windowSuited for
Opus 4.8 / 4.7 / 4.61M tokensCritical coding, complex reasoning, large-codebase navigation
Sonnet 4.61M tokensThe everyday agentic workhorse: interactive dev, iteration, exploration
Opus 4.5 / Sonnet 4.5 / 4.0 / Haiku 4.5200K tokensSingle tasks, cost-sensitive scenarios
One precondition worth knowing

The 1M window on Opus 4.6 / Sonnet 4.6 was only raised from 200K on 2026-03-24, is available only on the Pro / Pro+ / Power paid tiers, and requires an IDE / CLI restart to take effect once the capability is rolled out. So "4.6 = 1M" is correct in the docs, but in your environment there are still two gates — the paid tier and the client version. If you picked 4.6 but don't see 1M, check these two first.

A large window doesn't mean you can stuff anything into it. Context rot is the key concept on this floor: as the context grows, model performance degrades, because attention is spread across more tokens and old, irrelevant content starts interfering with the current task. This is an inherent property of the attention mechanism, not a bug in any particular tool.

window capacity noise share ↑ (old, irrelevant) usable attention / signal ↓ 0 tokens window limit
Kiro's docs don't use the term "context rot," but its context-management advice follows the same logic: don't drop large files straight into the conversation, compact as you approach the limit, and keep an eye on the usage breakdown. 1M isn't there to fit everything; it's there to let a single task finish more reliably.

Check usage: CLI uses /context show to see how much of the window context files / tools / responses / prompts each take up; IDE shows it directly in a panel. The CLI also has an off-by-default experimental toggle that keeps a color-coded usage percentage in the prompt line, so you can see it at a glance without re-running /context show:

kiro — acme-monorepo
› Switch the payments service to the new ledger client
Read services/payments/ledger.ts (+2.1K tokens)
Edit services/payments/charge.ts
~/acme-monorepo⎇ mainopus-4.8 · Maxcontext72%
That percentage is color-coded by usage — green (under 50%), yellow (50–89%), red (90%+) — so once it hits red you know it's time to /compact or spin up a subagent. Separately, the cap on context files is 75% of the model's window; anything over that is dropped automatically.
0.2Picking the right model is itself harness design

Kiro offers many models to choose from, so model selection involves a real trade-off.

A
It's not only Claude. Beyond the Claude family (Opus 4.8 / 4.7 / 4.6 / 4.5, Sonnet 4.6 / 4.5 / 4.0, Haiku 4.5), Kiro also integrates a set of open-weight models and a model router named Auto (the officially recommended default, which picks a model per task at a baseline credit multiplier of 1.0x). For reliability-sensitive work like coding, prefer the strongest Claude Opus; for exploratory, cost-sensitive work, Auto or a more economical model will do.
B
Reasoning Effort is an explicit five-level switch. It turns "let the model think a bit longer" into selectable levels: Low / Medium / High / XHigh / Max (the exact levels available are model-dependent). IDE adjusts it in the Effort panel to the right of the model selector, shown as something like "Claude Opus 4.6 · Max"; CLI adjusts it with /effort high, which persists, and you can also set a default effort per model in the config. The docs are explicit: the higher the effort, the more tokens, and the more credits it costs. Rule of thumb: turn effort up for coding and long-running tasks (XHigh is the sweet spot), and save Max for genuinely hard problems rather than blindly maxing it out.
C
From Opus 4.7 on there's also adaptive thinking. The model scales its own reasoning depth by complexity. This coexists with the effort you set manually; the two don't conflict.
D
Default behavior changes with each generation. This is especially worth noting for Kiro users, since its models update fairly often. When upgrading, don't assume "the new version beats the old one across the board" — every generation has trade-offs. For scenarios involving long documents and long-context retrieval, run a comparison on real tasks before deciding, rather than going by the version number alone.
E
Delegate to the model like an engineer, not a pair-programming partner. Every round of interaction adds reasoning overhead and credit consumption, so authorize it fully: in the first turn, state the intent, constraints, acceptance criteria, and relevant file locations all at once, and batch the problem so it can keep making progress. The cost-benefit of this is even more pronounced in Kiro because it bills by credit (see 0.4).
0.3The more reliable the model, the thinner the verification harness can be

For configuring the harness specifically, the most notable thing about the 4.8 generation is that hallucination is markedly lower, which bears directly on how much effort verification takes. Kiro itself corroborates this: in its model notes it singles out Claude Opus 4.8, saying it is "four times less likely than the previous generation to let a defect through in generated code," and lists it as the option with "highest reliability."

On whether you still verify after 4.8

What's trimmed is the number of rework cycles, not verification itself. Generation is not evaluation, and acceptance should come from an independent agent rather than the executor judging itself — this division of labor and check-and-balance cannot be removed (see Floor 4). A more reliable model saves the cost of "fixing it over and over," not the judgment of "whether to check."

0.4The credit multiplier decides your cost

Kiro runs on a credit multiplier system: each model carries a multiplier, and the more expensive the model and the deeper the reasoning effort, the more credits it consumes. Understanding this mechanism is how you spend cost where it matters.

ModelCredit multiplier (approx.)Meaning
Auto routerbaseline 1.0xWhen unsure, let it choose; it handles the expensive/cheap allocation for you
Claude Haiku≈ 0.4xSimple queries, running scripts, reading logs
Claude Sonnet family≈ 1.3xThe workhorse for everyday agentic work: interactive development, iteration, queries, tool calls — the best value balance
Claude Opus family≈ 2.2xThe primary model for coding: critical code, complex reasoning, changes where regression is costly
Some open-weight modelsas low as 0.05xExtremely cost-sensitive, low-reliability batch work

Three things follow directly from this mechanism: ① Assign models by role — the Opus family as the primary model for coding (critical code, complex reasoning), Sonnet as the workhorse for everyday agentic work (most interactive development), Haiku and open-weight models for the cheapest lightweight tasks, and Auto when you're unsure; ② effort is another cost variable — Max is noticeably more expensive than Low on the same model, so don't reflexively set Max for simple tasks; ③ split large tasks — use Sonnet for research and switch to Opus for critical code, rather than running on the strongest model the whole way.

On the inference backend: the docs only confirm that Kiro's inference runs in AWS regions (us-east-1, eu-central-1) using cross-region inference; they don't state "Amazon Bedrock" verbatim. Externally, say "AWS-hosted, cross-region inference," and don't present a reasonable inference as something the docs state explicitly.

1Sheet 01 · Context② Configuring Kiro

Context: the only resource you truly manage

The loop is ready-made; what you manage is what content goes in and when to clean it up — and this floor largely determines how far Kiro can go. Note that the three ways of using it — IDE, CLI, ACP — diverge sharply on this floor: the CLI has a full set of slash commands (/compact, /rewind, /tangent), the IDE uses panel buttons and # references, and ACP carries the CLI's capabilities into third-party editors over a protocol (detailed in the next section).

1.1Every step is a branch point: a full action table across the three modes

After the model completes a step, you have several choices. The easiest is "continue," but the other options exist precisely to help you manage context quality.

First, clear up something easy to confuse. Kiro has three ways to use it: IDE, CLI, ACP. ACP is not a third standalone product but a mode of the CLI: you run kiro-cli acp, and underneath it's still the CLI's agent core — it just communicates over an open protocol (JSON-RPC), so you can call Kiro from the AI panels of third-party editors like JetBrains, Zed, and Neovim. So ACP's capabilities are essentially a subset of the CLI's: terminal-only interactive commands (rewind's turn selector, tangent's toggle) aren't available, and the IDE's GUI-only features (the checkpoint panel, the Spec panel) certainly aren't.

The table below lays out, for each context action, whether it exists in each of the three modes, how to use it, and when. A few names are especially easy to misread (rewind doesn't revert files, clear only clears the display), and those are marked:

ActionIDECLIACPWhen to use / note
Continue same session✓ panel✓ send messagesession/promptContinue while the context is still relevant; everything in the window is in play
Compact context✓ Summarization/compact✓* compactionWhen the session is full of stale debugging info, compact manually with instructions; don't wait for auto-compact to fire when the model is at its dullest
Open a brand-new conversation (start fresh)✓ new chat/chat newsession/newUse this to start a brand-new task. On CLI don't use /clear — it only clears the display, not the context, an easy point of confusion
Roll back the "conversation" (fork)✗ no rewind/rewindWhen you want to drop a failed attempt but keep the file-reading results. Reverts only the conversation, not files; disk changes remain
Roll back "code + context"✓ checkpoint Restore△ experimental (shadow git)✗ no GUIWhen files are in a mess and you need to revert the code too; rewind can't touch files. checkpoint is not a replacement for git
Tangent exploration/tangentWhen exploring a side branch without polluting the main line; Ctrl+T to toggle
Switch model✓ selector/modelsession/set_modelChoose by task difficulty; save the expensive models for critical coding (see 0.4)
Switch agent config/agent swapsession/set_modeSwap in a different agent's tool set and default model
Delegate to a Subagent✓ up to 4 in parallel_session/terminateWhen the next step produces a lot of output you only need the conclusion of, keeping the intermediate noise in an isolated context (see Floor 3)
Add a file to context# reference@path / /context add✓* via command extensionWhen this session needs to reference a particular file temporarily
Large codebase / mass-document retrieval/knowledge (experimental)When content exceeds ~10MB or thousands of files; retrieved on demand, doesn't occupy resident context (see 1.7)
Image input✓ image capabilityPaste a screenshot or design mockup for the model to look at

✓* = a Kiro extension to ACP, experimental, and only effective if the host editor implements the _kiro.dev/ extensions; without it, it degrades to plain standard ACP (only sessions, models, streaming). the mode lacks this action, not listed as supported by the docs, experimental, must be enabled manually.

The IDE side has no slash commands overall; the equivalents are panel buttons: Summarization (the counterpart to /compact), the checkpoint panel's Restore (roll back code + context), Revert (undo only the most recent turn's changes), and # context references.

1.2The compaction mechanism: automatic vs. manual

Compaction is fundamentally lossy compression: as you near the window limit, the whole conversation is summarized into a more concise description, and the model continues from the summary in a new context. CLI triggers it manually with /compact, and it also fires automatically when context overflows; the two tunable settings are keeping the most recent 2 message pairs by default and keeping 2% of the context window by default, taking whichever is more conservative. A Kiro-specific detail: after a compact, a new session is created rather than the original being compressed in place, and you can recover the original with /chat resume. The IDE equivalent is the Summarization feature, with the same logic.

1.3What to watch with auto-compact, plus a Kiro-specific detail

The first thing to watch: at the moment auto-compact fires, the model happens to be at its least intelligent because of context rot, yet it has to make the most critical "keep what, discard what" decision. A typical failure chain: after a long debug session, auto-compact fires and the summary focuses on the debugging; then you say "now go fix that warning in the other file," but the warning's details have already been dropped. The right approach is proactive management: compact manually with /compact ahead of time, while you know your next step and the model is still in good shape.

An easy point of confusion: /clear is not a new conversation

CLI /clear only clears the screen display, not the saved conversation context. The docs are explicit on this: it clears the display only, not the saved conversation. A common misuse is assuming /clear starts a new conversation, when the context hasn't actually been cleared. In Kiro, the real way to start a new conversation is /chat new — note this distinction.

1.4rewind and checkpoint are two different things

Kiro has an easily-confused design around "rolling back": it splits "roll back the conversation" and "roll back files" into two independent mechanisms, and one command can't do both at once.

CLI · /rewind

Rolls back the "conversation"

Forks the session into a new branch at some earlier turn; the selector shows each turn's prompt preview and the context usage percentage at that point.

But the docs are clear that it does not roll back file changes: files changed by later turns stay as-is on disk, and reverting the files too requires version control.

IDE · checkpoint Restore

Rolls back "code + context"

A checkpoint is created automatically with each prompt; clicking Restore rolls both the codebase and Kiro's context back to that point.

CLI also has checkpoints, but it's an experimental feature, implemented with a shadow git repo, must be enabled manually (chat.enableCheckpoint true), and is cleared when the session ends — it doesn't persist across sessions.

What checkpoint actually tracks

It only tracks files the Kiro agent itself changed with built-in tools. Files you edited manually, that a formatter touched, or that an MCP tool or bash command changed are not tracked, and may be overwritten and lost on Restore. So the docs repeatedly stress: checkpoint is not a replacement for git; always use it alongside version control.

Two needs, two tools: to drop just the conversational noise of "trying approach A" while keeping the file-reading results, use /rewind; if approach A has already left the files in a mess, use checkpoint Restore (or git directly) to revert the files too.

1.5Four ways to supply context

The CLI docs explicitly split "supplying context to the conversation" into four categories, chosen by "whether it occupies resident tokens":

MethodOccupies resident context?Suited for
Agent ResourcesYes, occupies tokens on every requestProject rules that stay resident across sessions
Skills (skill://)Metadata resident, body loaded on demandReusable specialized workflows
Session Context (/context add)Yes, current session onlyFiles this session needs to reference temporarily
Knowledge Bases (/knowledge)No, occupies context only when searchedLarge codebases, mass documents
1.6New task means new session

A rule of thumb: new task = new session. Having just finished implementing a feature and about to write its docs, you should consider starting a new session (CLI uses /chat new — again, not /clear). There's a trade-off between related tasks: reusing context is faster and cheaper, since the model doesn't have to re-read the files you just changed; a new session is cleaner but has to re-read the relevant files, which is slower and costs more credits. The decision rule: if the next task is highly related to the current context, continue; otherwise start a new session.

1.7Knowledge Base: native RAG for large codebases

CLI /knowledge (experimental) can turn a large file, an entire document library, or even a whole codebase into a semantic-search knowledge base: it occupies context only when a search hits it, and consumes no tokens otherwise. The docs' threshold is clear: once content exceeds about 10MB or thousands of files, it shouldn't go into resident context — use a Knowledge Base instead. It's an optional persistent semantic index layer that, in large-codebase scenarios, saves a lot of "have the agent read files one by one" overhead. The trade-offs: it's experimental with limited stability, plus the indexing performance bottleneck on very large codebases (see 2.3).

2Sheet 02 · Knowledge & tools② Configuring Kiro

Knowledge & tools: the blueprint

This floor is the guide's core framework. An often-overlooked premise: the harness matters as much as, or more than, the model itself. Teams fixate on benchmarks, but in actual use, results depend more on the ecosystem you build around the model. This floor has one thing going for it: most of Kiro's Steering / Skills / Hooks / MCP is shared between IDE and CLI — configure it once and both sides can use it.

2.1Understand the mechanism first: Kiro uses hybrid retrieval

A common misconception is that an agent finds code purely by traversing files live, with no index. Kiro doesn't work that way — it's a hybrid of three coexisting mechanisms, which is the easiest thing to get wrong by assumption:

① Persistent vector index #codebase semantic recall built automatically on open ② Agentic tools grep / glob / read browse like an engineer ③ tree-sitter / AST symbol search / structural edits no extra LSP install Kiro understands the codebase
This brings two things: first, in large-codebase scenarios Kiro has an index as a fallback and doesn't traverse from scratch every time; second, the index has to be maintained, and indexing performance on very large codebases (tens of thousands of files) is a known area for improvement in Kiro (see 2.3). The underlying embedding model is not officially disclosed; treat it as third-party information.

Worth emphasizing: symbol-level and structure-level code understanding (which in many tools requires manually installing an LSP) is something Kiro has built into the product (the CLI's tree-sitter code tool, the IDE's index and AST editing). So in Kiro this isn't scaffolding "you have to build" but an existing capability you just need to "understand is already there."

2.2Build order: seven layers, bottom-up

Building the harness has an order; each layer rests on the previous one, and you shouldn't skip. Kiro's version is Steering → Hooks → Skills → Powers → code intelligence → MCP → Subagents. Each layer comes with its "common mistake" and "which line it belongs to":

1
Steering SHARED Project context files Kiro reads automatically on every interaction, placed in .kiro/steering/, supporting several inclusion modes. Principle: only put "broadly applicable" content here; too much drags down performance. Common mistake: writing reusable specialized knowledge into it (that kind of content belongs in a skill). See 2.4.
2
Hooks IDE strong CLI weak Scripts or agent actions that run automatically at key moments. The most valuable use isn't "preventing errors" but deterministically automating repetitive checks, syncs, and scaffolding. Common mistake: using a prompt to implement what should run automatically. See 2.6.
3
Skills SHARED Follow the open Agent Skills standard: the SKILL.md + scripts/references/assets folder structure, progressive disclosure, semantic triggering by description, importable from the community and exportable too. Common mistake: writing everything into steering instead of using a skill. See 2.5.
4
Powers SHARED A Kiro-specific packaging concept: bundle MCP tools, related knowledge, and workflows into one package, activated dynamically by context — suited to integration scenarios where you want to "give the agent both tools and guidance on how to use them." See 2.7.
5
Code intelligence (index / AST) IDE index CLI tree-sitter Gives the model symbol-level and structure-level code understanding. Kiro mostly builds this in; what you do is understand it's there, manually trigger an index rebuild when needed, and set expectations about its performance limits on very large codebases. Common mistake: assuming it can handle an arbitrarily large codebase without limit (see 2.3).
6
MCP Servers SHARED The channel to connect internal tools, data sources, and APIs, configured in .kiro/settings/mcp.json, with enterprise governance (MCP Registry) and one-click install links. Common mistake: wiring up MCP before the basic configuration is ready. See 2.7.
7
Subagents SHARED Subagents with isolated context that return only the result after executing a task. Both the IDE and the CLI have them (the IDE ships two built-ins plus custom ones; the CLI adds finer control such as up to 4 in parallel), see 3.5. Common mistake: exploring and editing in the same session.
2.3Make the codebase navigable

The core constraint: a model's capability ceiling in a large codebase equals its ability to find the right context. A few recurring practices: keep Steering lean and mode-segmented (don't set everything to always-load); scope test and lint by subdirectory; exclude generated files with .gitignore and protected paths; when the structure is unclear, have Kiro generate a codebase map (IDE uses #repository, and for an existing codebase you can first generate structure.md / tech.md / product.md); have the model use tree-sitter symbol search rather than pure string grep; and use a Knowledge Base for large codebases rather than loading everything.

How indexing behaves on very large codebases

Kiro's embedding index currently has a performance bottleneck on very large codebases. The community has reported that on codebases at the fifty-thousand-file scale with years of git history, indexing struggles to complete and CPU usage runs high; this feedback is tracked as an open improvement request on Kiro's GitHub. Small-to-medium codebases (a few thousand files) index fine, while on very large monorepos you can expect "indexing is slower, and pairs best with a Knowledge Base and per-module scoping". Pick the strategy to match the scale.

2.4Steering, in depth: four inclusion modes SHARED

Steering files live in .kiro/steering/ (workspace level) or ~/.kiro/steering/ (global level); on conflict, the workspace wins. Each .md loads automatically, split by domain. In the IDE, clicking "Generate Steering Docs" produces three base files — product / tech / structure — on which Kiro's basic understanding of the project is built. Steering is more flexible than a single project doc loaded in full from start to finish, and that flexibility comes from these four inclusion modes (written in the YAML front-matter at the very top of the file):

ModeSyntaxBehavior
always (default)this is the default if omittedLoaded on every interaction
fileMatchinclusion: fileMatch + fileMatchPatternLoaded only when working on files matching that glob
autoinclusion: auto + required name/descriptionLoaded automatically by semantic match of the description to the request, same mechanism as Skills
manualinclusion: manualNot loaded automatically; reference it explicitly in chat with #filename

fileMatch is the key: Kiro's steering files are flat (all in one directory), and when they load is decided by the inclusion mode, not by which directory the file sits in. To get "rules specific to a certain subdirectory," write a fileMatch glob that matches those files, rather than placing the rule file inside that subdirectory. Additional notes: #[[file:relative-path]] links a live workspace file into steering (some reports say this syntax occasionally doesn't work on the CLI but is fine in the IDE); AGENTS.md is supported but doesn't support inclusion modes and always loads in full; and on the CLI, when using a custom agent, steering doesn't load automatically — you must add it explicitly in the agent config's resources field.

2.5Skills, in depth: an open standard SHARED

Kiro's Skills use exactly the open Agent Skills standard, so mature community skill know-how and ready-made skill packages are largely reusable here as-is. The structure is identical:

my-skill/
my-skill/
├── SKILL.md  # required: main instructions + front-matter
├── scripts/ # optional: executable scripts
├── references/# optional: reference docs, loaded on demand
└── assets/ # optional: template assets

Locations: workspace .kiro/skills/, global ~/.kiro/skills/, with the workspace winning on name conflict. Both name (must match the folder name) and description (used by Kiro for semantic matching) are required. Progressive disclosure works the same way: at startup only name+description load, and the full SKILL.md loads only when a request matches. The familiar best practices apply directly: don't pile on filler, add a Gotchas section to record failure points, use the file system for progressive disclosure, and write the description as trigger conditions rather than a summary.

A current limitation for global skills

When Kiro activates a skill, it injects the SKILL.md body as raw text, without the skill's directory path. The result: for a skill placed in the global directory (~/.kiro/skills/), the relative references written in its SKILL.md (references/xxx.md) may fail to resolve, while skills inside the workspace are fine (this is an open issue the community tracks on Kiro's GitHub). Until it's addressed, the "thin SKILL.md + references loaded on demand" pattern is less reliable for global skills, so the simplest path is to put the skill in the workspace, or write as much content as possible into the SKILL.md body.

2.6Hooks, in depth: rich events in the IDE, weaker in the CLI

This is the layer where IDE and CLI diverge the most, so they have to be covered separately.

IDE · Agent Hooks · ten events

Rich events, including file system & spec

Placed in .kiro/hooks/, each a *.kiro.hook JSON file, committable to git for team sharing.

  • Prompt Submit / Agent Stop
  • Pre / Post Tool Use (before/after a tool call, can block)
  • File Create / Save / Delete (file-system events)
  • Pre / Post Task Execution (before/after a spec task)
  • Manual Trigger

Two action types: Ask Kiro (sends a prompt, starts a new loop, consumes credits) and Run Command (runs a shell command, no credit cost, faster, and a non-zero exit code can block at the Pre Tool Use stage).

CLI · only hooks the agent lifecycle

A different, leaner set

Written in the hooks field of the agent's JSON config, with events limited to:

  • agentSpawn (agent initialization)
  • userPromptSubmit
  • preToolUse / postToolUse
  • stop

No file events and no spec-task events (those are IDE-only).

One gap: lifecycle events like session end, before compaction, and subagent completion have no direct counterpart on either line right now.

2.7MCP and Powers: how to use the integration side SHARED

MCP integration is configured in .kiro/settings/mcp.json (workspace level) or at the user level; the two merge, with the workspace winning. It's the standard mcpServers format — local servers use command/args/env, remote ones use url/headers. A few things Kiro adds here: finer-grained auto-approval (autoApprove / disabledTools / disabled); the MCP Registry (CLI, where an organization centrally pushes servers, and registry defaults can be overridden locally key by key — suited to enterprise control); and one-click install links ("Add to Kiro" placed in a README). One thing to note: the Web Kiro Web sandbox currently supports only local stdio servers, not remote ones.

Cost-reduction mechanism: Kiro has Tool Search, which, when there are many tools, doesn't preload all tool definitions but searches and pulls them at runtime on demand. It's off by default on the CLI; the trigger threshold is 5% of context or 50,000 tokens, and turning it on is recommended once you have more than 5 MCP servers. Powers, meanwhile, bundles "MCP tools + knowledge + workflows" into one package activated dynamically by context: MCP provides the tools, Skills provide the operational knowledge, and Powers provide the combined "tools plus knowledge" volume.

2.8The undercurrent of this floor: less is more

One principle runs through this floor: more tools and more instructions are not better. Tool schemas and overly long instructions both crowd out the space the model uses for reasoning. A short steering doc and a focused skill often beat piling on redundant configuration. Because Kiro bills by credit, this also has a direct cost implication: every extra piece of resident context is something you pay for on every request. The default leaning should be "verify with the minimum configuration whether the model can handle it, then add incrementally based on the problems you actually hit." This undercurrent is tied off in the closing section, "build it up, then cut it back."

3Sheet 03 · Orchestration② Configuring Kiro · the most distinctive floor

Orchestration: who should hold the plan

At this floor the question becomes: for a given task, do you let the model keep the whole plan in its head turn by turn, or hand the plan off to be held elsewhere? Orchestration comes down to one question: who holds the plan. The most naive approach is to have the model keep the plan in context and proceed turn by turn, but once a task grows large, the plan tends to drift and distort under compaction.

3.1"Who holds the plan": Kiro's answer is to persist it to disk as a spec

Kiro's answer is distinctive: persist the plan to disk as spec files, have a human review them, then let the agent implement against them. The plan is no longer held in the model's context turn by turn but becomes three markdown files on disk that can be version-controlled and reviewed by the team. This is Kiro's signature capability, Spec-driven development, and it is IDE-exclusive (the CLI has no built-in spec, only a lightweight Plan agent, see 3.7). What this buys you is that the plan is readable, reviewable, and traceable: think it through at the document level and have a human confirm it before writing any code — especially suited to complex, high-regression-cost development that needs team alignment.

3.2The three Spec files IDE

Each spec generates three files, placed under .kiro/specs/<feature>/, one directory per feature:

requirements.md
What to build
User stories + acceptance criteria, written in EARS syntax (WHEN [condition] THE SYSTEM SHALL [behavior]). Clear, testable, traceable.
design.md
How to build it
Technical architecture, sequence diagrams, data flow, interfaces, error handling, test strategy. Derived from "analyzing your codebase + the approved requirements."
tasks.md
Broken into trackable steps
A discrete, trackable list of implementation tasks (markdown checkboxes), ordered by dependency, with each task linked back to a specific requirement.

Two workflow variants plus a quick mode: Requirements-First (variant; behavior is clear, architecture is flexible) · Design-First (variant; an architecture already exists, or you must meet hard non-functional requirements like latency/compliance) · Quick Plan (a separate session mode, not strictly a variant: auto-generate all three files at once, skipping the inter-stage approvals — suited to mature features where you trust Kiro's output). The default has approval gates: Requirements → (human review) → Design → (human review) → Tasks, and before entering design there's an Analyze Requirements step that helps you catch contradictions, ambiguities, and gaps in the requirements. Note the division of labor: a spec covers only a single feature's plan; project-level resident context goes through steering, and the two are kept separate.

3.3Task execution: dependency graph + wave-based parallelism IDE

tasks.md has a dedicated execution view that shows each task's status in real time. Run all Tasks: Kiro automatically builds a task dependency graph and groups the dependency-free tasks into "waves" — the first wave runs all dependency-free tasks concurrently, the second runs those whose dependencies the first wave satisfied, and so on — serial between waves, concurrent within a wave. Sync Files: when triggered, has Kiro scan the codebase and automatically mark completed tasks, handling the case where "you or a teammate already did part of it in another session." Beyond running everything at once, you can also execute tasks one at a time (the Start Task link above a task), reviewing each before moving to the next — handy if you like "one task, one review."

3.4Spec mode vs. Vibe mode IDE
Spec mode

The structured three-document flow

Use it for complex features, team collaboration, and high-regression-cost changes. Choose it when you've thought it through and want to ship, leave a record, and align the team.

Vibe mode

Improvised, conversational development

Use it for quick exploration, prototyping, and when the goal isn't yet clear. Spec is Kiro's "strong default" leaning, but not a hard constraint — Vibe is available anytime.

3.5Subagent: a subagent with isolated context SHARED

Subagents exist in both the IDE and the CLI; only the granularity of control differs. They are a different thing from the wave-based parallelism of spec tasks (3.3): that is parallelism at the task-orchestration layer, while a subagent spins up a child agent with its own isolated context window to run a stretch on its own. In common: the subagent gets its own fresh window, the main agent decides whether to delegate, and when it finishes the result rolls back to the main agent, keeping the intermediate noise in the isolated context. The two lines differ as follows:

IDE
Two built-in subagents, plus custom ones. The IDE has had two built-in subagents since v0.8: one for context gathering (explore the project, collect context) and one general purpose (run the rest in parallel), which Kiro launches automatically as needed. Since v0.9 (2026-02), you can also define custom subagents by dropping .md files in ~/.kiro/agents (global) or .kiro/agents (project) — YAML frontmatter sets name / description / tools / model — and they also surface as slash commands. Note that IDE subagents cannot access Spec, and Hooks do not fire inside them.
CLI
A built-in subagent tool with finer control. The CLI ships a built-in subagent tool (the early experimental delegate is deprecated and superseded by it), exposing a few more controls than the IDE: up to 4 in parallel (a hard limit to keep in mind when planning parallelism); calling a custom agent by name ("Use the backend agent to refactor…"; the orchestrating agent's tools must include subagent to delegate, and it uses the built-in summary tool to roll results up); Ctrl+G opens the execution monitor to see each subagent's live status; and permissions via availableAgents (restrict which can be dispatched, glob-supported) and trustedAgents (run without approval).

Where it fits (both lines): research-intensive tasks (read dozens of files and return a summary), multiple mutually independent subtasks (run in parallel), needing a fresh perspective (not inheriting the main session's assumptions), and an independent review before committing. The mental test: "Do I need these tool outputs themselves, or only the conclusion?" If only the conclusion, delegate.

3.6The boundary of orchestration: the plan is always held by the LLM main agent

One sentence to frame the boundary of Kiro's orchestration and avoid mismatched expectations. Kiro's orchestration is agent-driven and described in natural language; the holder of the plan is always the LLM main agent, not a deterministic script. Two concrete manifestations:

·
Subagents support a dependency graph (DAG), with independent tasks running in parallel and dependent ones waiting for their prerequisites. But this graph is planned by the main agent itself, not declared by a script you write; the docs are explicit that "the task graph is planned up front and can't change during execution," and there's no config file or schema for you to hand-write a DAG.
·
The review loop (implement → review → on failure, send back for rework, with a configurable max_iterations capped at 10 rounds) is likewise driven by you describing intent in natural language and the agent deciding how — there's no hand-writable script or schema.

So Kiro's strength is: use a spec to persist a single feature's plan and get it reviewed clearly, then use subagents for limited parallel delegation and adversarial review. Understand this boundary and you won't expect it, when configuring, to do script-level, line-by-line replayable, large-scale deterministic orchestration.

3.7Three other pieces
CLI · Plan agent

A lightweight compensation when there's no spec

Built into the CLI from 1.23 (Shift+Tab or /plan): structured questioning → read-only codebase exploration → produce a task breakdown → hand the approved plan off to an executing agent. But the output is a temporary plan, not the three persisted Spec files — don't conflate the two.

Multi-agent teams

An example pattern, not an out-of-the-box feature

Kiro has no built-in product called "Agent Teams." The official sample repo demonstrates assembling a multi-agent team out of subagent + steering + prompts + hooks (an architect dispatches tasks to coder/ops, then hands off to a reviewer and a security-reviewer), but that's an example pattern you assemble yourself.

Web · Kiro autonomous agent (preview)

Pro/Pro+/Power subscriptions only, off by default. This is a different form closer to a "frontier agent," not the CLI subagent: it clones the repo in an isolated sandbox, runs up to 10 tasks concurrently, internally coordinates a few specialized subagents (research/planning / writing code / verification), has persistent memory across tasks and repos, can learn team conventions from PR review feedback, and can be dispatched a task from a GitHub issue via a /kiro comment. In this mode you can't choose the model (the agent chooses automatically).

4Sheet 04 · Verification② Configuring Kiro · the most overlooked

Verification: the executor should not judge its own output

Writing code is now cheap, and the bottleneck has moved to verification accordingly. This is the most overlooked yet most critical layer when building the scaffolding.

4.1Generation is not evaluation, and they must be separated

The core is one sentence: have the agent that executed a task evaluate its own output, and it will almost always confidently give it a good grade even when the quality is mediocre — especially on tasks without a binary pass/fail standard. This is no different from having a developer review their own code. The fix is to separate generation from evaluation, the way development and QA are different people on a traditional team. A key engineering reality: tuning an independent reviewing agent to be strict and exacting is far easier than getting the executing agent to learn self-criticism. Code review, independent QA, separation of audit — the human world has long used division of labor as a check and balance.

4.2How to do it in Kiro

Kiro builds this division of labor into the product, so you don't have to build it from scratch:

·
CLI review loop: subagents support an "implement → independent reviewer reviews → on NEEDS_CHANGES, send back for rework" loop, with the reviewer reviewing in an isolated context and no motive to protect its own output. This is a ready-made implementation of "the executor doesn't judge itself."
·
CLI independent reviewing subagent: use a subagent to spin up a reviewer with a fresh context; it doesn't know the trade-offs and assumptions made during implementation, so its perspective is more objective. The official multi-agent example specifically has two independent reviews, a reviewer + a security-reviewer.
·
Web autonomous agent's built-in verification agent: checks the output before moving on to the next step.
4.3Autopilot vs. Supervised: two things to get straight first IDE
The default mode is Autopilot (hands-off execution)

Kiro IDE defaults to Autopilot, where the agent works autonomously end to end, changing code across files, running commands, and making architectural decisions without pausing to ask at each step. This is a deliberate choice on Kiro's part: favor letting the agent keep momentum, and put the control point after the fact, where you review all the diffs, roll back the whole thing, or interrupt mid-execution. If you prefer "step-by-step confirmation," just switch to Supervised (it stops after each turn that contains file edits to wait for your approval, with changes accepted/rejected hunk by hunk). Neither cadence is better in the abstract; pick by task and personal habit.

Supervised is a review cadence, not a security control

This is a point the Kiro docs call out specifically: Supervised is not a sandbox, not an isolation boundary, not access control — the agent's underlying permissions are exactly the same in both modes. It only determines "whether to show you the diff before changes hit disk." The real security boundary is a different set of mechanisms: protected paths, trusted commands, workspace isolation, credential scoping, where "writing a protected path requires approval" is enforced in both modes. The CLI side has a different permission model: use /tools trust / untrust / trust-all to control which tools skip confirmation.

So when configuring verification, think of two things separately: Autopilot / Supervised governs the review cadence (whether to look hunk by hunk); protected paths / trusted commands are what govern security (whether it can touch something). Don't treat Supervised as a security boundary.

4.4Property-Based Testing: Kiro-specific verification IDE

Kiro also has a fairly distinctive built-in verification capability: Property-Based Testing. It automatically extracts "properties" from EARS-format requirements, generates hundreds or thousands of random test cases, and uses shrinking to find the minimal counterexample (akin to a built-in red team). When a test fails, Kiro gives you three options: "change the implementation / change the test / change the requirement." This connects neatly to the spec's traceability: requirements → tasks → property tests form one chain. With this step, Kiro wires "formalized requirements" into "automated verification"; it's off by default and optionally enabled.

Capstone · applying the four floors to a large codebase

Onboard the agent like a new hire

Combine the preceding four floors in a real campaign, using only Kiro's ready-made capabilities throughout, with no self-built agent. What's covered here is the methodology; the methodology is independent of which tool you use, and below it's mapped point by point onto Kiro's mechanisms, with the real-world constraints stated honestly. A large legacy codebase can't be handed to the agent all at once; the right approach is a five-step loop:

Step 1
Limit the scope
Find a bounded, describable task
Step 2
Provide context
Explain enough background
Step 3
Complete the task
Get it working
Step 4
Persist the context
Write what you learned back
Step 5
Expand the scope
Move to the next, building on what's accumulated

Each time a task finishes, write the lessons back, and the next round of the agent starts from a higher point. You wouldn't put a new hire in front of hundreds of thousands of lines of code on day one; the same goes for the agent.

MappingHow this methodology lands on Kiro's four floors
Context layer · Floor 1

Persist the project's common knowledge, conventions, and gotchas into steering, and maintain it as an engineering artifact independent of code branches as much as possible (if context is isolated by branch, the agents on different branches become "different people"). Index masses of test logs, docs, and historical code semantically with a Knowledge Base, retrieved on demand rather than loaded in full.

Knowledge layer · Floor 2

Carry specialized workflows in Skills, keeping them lightweight by "pointing to a central knowledge base rather than copying content"; in particular, build a debugging skill that sets "investigating a bug must start with root-cause analysis, no blind trial-and-error" as a hard constraint.

Orchestration layer · Floor 3

For large changes, first use a spec to think the plan through and get a human review before acting (the approval gates are precisely the structural safeguard for "don't dive in blindly across hundreds of thousands of lines"); use subagents when you need parallel exploration (remember the limit of 4).

Verification layer · Floor 4

Use a review loop / independent reviewer as the gate, lock down critical modules with protected paths, and don't let Autopilot touch core code without protection in place.

Large-codebase indexing: one practical consideration

Kiro's embedding index currently has a performance bottleneck on very large codebases (tens of thousands of files, with years of git history), where indexing may be slow or struggle to complete. So on a very large monorepo, don't assume #codebase can cover the whole codebase indiscriminately — pairing it with a Knowledge Base + per-module scoping is the steadier route. This is actually consistent with the "onboard the agent like a new hire, feed it in slices" methodology: you shouldn't load the entire codebase at once in the first place.

Four key points (a summary tying the four floors together): scope isolation is the prerequisite — a large codebase must be fed in slices and expanded gradually; the debugging skill must be set as a hard constraint, since blindly changing one line can trigger a cascade of failures; context should be kept independent of code branches, or the agents on different branches amount to different individuals; and don't put faith in full indexing on a large codebase — enable a Knowledge Base when needed. This is not an out-of-the-box, do-everything solution; it needs someone to continuously maintain the context layer and advance incrementally.

Closing · finishing is where the subtraction starts

Build it up, then cut it back: the harness's half-life

A harness isn't built once and done for good; it has a half-life. Once the model gets stronger, you should proactively cut scaffolding. How much you can cut depends on how strong the model is. This principle holds up: every piece of the harness bets on an assumption about "what the model can do," and assumptions expire. A patch made for some flaw in an old model becomes pure dead weight once the model upgrades and that flaw disappears, actually slowing things down. So the right mindset isn't "how do I make Kiro stronger" but "what can I stop doing."

Cuttable (scaffolding)
  • The verification harness: as the model grows more reliable, review-loop rounds and Supervised's hunk-by-hunk review can both be relaxed appropriately
  • Density of spec approval gates: use Quick Plan to skip gates for simple, mature features
  • Specific configuration tuned for a particular version: Kiro iterates fast, and these expire
Not cuttable (core)
  • Separating generation from evaluation: without an independent evaluator, bugs on edge-case tasks still slip through
  • Requirements analysis and quality gating: complex, high-risk work still goes through human review
  • Context discipline: lean steering, context independent of branches

Three concrete takeaways: ① as the model grows more reliable, the verification scaffolding should thin out — but the division-of-labor check itself can't be removed; ② tune the density of spec approval gates to the model's capability, but don't turn them off entirely; ③ Kiro itself iterates extremely fast, so configuration you tuned for a particular version will also expire — treat configuration as something to maintain continuously and review periodically.

It comes down to one line: clearly distinguish the cuttable part (scaffolding) from the un-cuttable part (context discipline, division-of-labor verification). The space of valuable harness combinations doesn't shrink as the model advances; it just moves. The best agent framework is the one you can delete. The advice: every 3 to 6 months, or whenever you feel performance stagnating after a major model release, run a full harness audit and proactively delete assumptions that are no longer necessary.

Iterates fast
Kiro's behavior changes between versions, and configuration tuned for one version expires; treat it as something to maintain
Division of labor stays
Generation ≠ evaluation, plus requirements analysis and quality gating — the core that doesn't disappear as the model gets stronger
It only moves
The harness space worth caring about doesn't shrink as the model advances; it just migrates