The model is the fulcrum;
the harness is the lever
Kiro writes the agent's main loop for you; the ring around that loop is yours to configure. With the same model, how well you configure that ring decides whether it's a tool that edits a few lines or a developer that can carry a large system. But one thing has to be said up front: Kiro is two product lines, and the IDE and CLI are not feature-aligned. Every capability in this guide is labeled with which line it belongs to.
Same model;
the environment decides its ceiling
Treat the model like a developer who just joined the team and onboard it properly, and the things that have been sitting on the backlog can start moving again; meanwhile the model's own capability really is the ceiling. Put the two together: the harness is the adjustable lever, the model is the fixed fulcrum. You can't swap the fulcrum; the only thing you can move is the lever. Kiro makes "configuring the lever" mature and ready to use. Beyond configurable primitives like steering, hooks, and skills, it ships a strong default paradigm of its own: write a spec first, have a human approve it, then let the agent implement against it.
A discipline that runs through the whole guide: a harness decays over time. Once the model improves, the scaffolding you built earlier can become dead weight. So for each layer below, we note when it should be retired — picked up again in the closing section. One more reminder that holds specifically for Kiro: it iterates extremely fast, and behavior changes between versions. Wherever a specific behavior is noted (where a button lives, what a command's default is), confirm it against your current version number before you rely on it.
This guide only covers "configuring Kiro"
What Kiro gives you is an already-implemented agent main loop; you don't need to write the loop yourself. This guide discusses one thing only: how to configure that ready-made framework. Keep three categories apart: ① what the model gives you (can't change, Floor 0), ② what you configure (the body of this guide), and ③ what you build from scratch (out of scope here).
Model selection and billing, context management, the build order for knowledge & tools, orchestrating with Spec and Subagents, and breaking verification out on its own. All of it is configure, don't build inside Kiro.
Plus ① model capability: this is given, can't be changed, can only be worked around, and counts as the foundation of the harness (Floor 0).
Building an agent from scratch with an API or SDK is a different game: writing your own main loop, designing your own prompt caching, implementing your own MCP server.
Where such topics come up, this guide only touches on them, marks them "out of scope," and moves on.
Five floors, built bottom-up
Floor 0: the foundation you can't get around
This floor isn't built — it's given — but you have to understand it first, because it determines how lean the harness can be. SHARED This floor is essentially the same for IDE and CLI; the only difference is where you switch settings.
Kiro's context window is model-dependent, not a fixed value. The model you pick determines how large a window this session can use.
| Model | Context window | Suited for |
|---|---|---|
| Opus 4.8 / 4.7 / 4.6 | 1M tokens | Critical coding, complex reasoning, large-codebase navigation |
| Sonnet 4.6 | 1M tokens | The everyday agentic workhorse: interactive dev, iteration, exploration |
| Opus 4.5 / Sonnet 4.5 / 4.0 / Haiku 4.5 | 200K tokens | Single tasks, cost-sensitive scenarios |
The 1M window on Opus 4.6 / Sonnet 4.6 was only raised from 200K on 2026-03-24, is available only on the Pro / Pro+ / Power paid tiers, and requires an IDE / CLI restart to take effect once the capability is rolled out. So "4.6 = 1M" is correct in the docs, but in your environment there are still two gates — the paid tier and the client version. If you picked 4.6 but don't see 1M, check these two first.
A large window doesn't mean you can stuff anything into it. Context rot is the key concept on this floor: as the context grows, model performance degrades, because attention is spread across more tokens and old, irrelevant content starts interfering with the current task. This is an inherent property of the attention mechanism, not a bug in any particular tool.
Check usage: CLI uses /context show to see how much of the window context files / tools / responses / prompts each take up; IDE shows it directly in a panel. The CLI also has an off-by-default experimental toggle that keeps a color-coded usage percentage in the prompt line, so you can see it at a glance without re-running /context show:
/compact or spin up a subagent. Separately, the cap on context files is 75% of the model's window; anything over that is dropped automatically.Kiro offers many models to choose from, so model selection involves a real trade-off.
Low / Medium / High / XHigh / Max (the exact levels available are model-dependent). IDE adjusts it in the Effort panel to the right of the model selector, shown as something like "Claude Opus 4.6 · Max"; CLI adjusts it with /effort high, which persists, and you can also set a default effort per model in the config. The docs are explicit: the higher the effort, the more tokens, and the more credits it costs. Rule of thumb: turn effort up for coding and long-running tasks (XHigh is the sweet spot), and save Max for genuinely hard problems rather than blindly maxing it out.For configuring the harness specifically, the most notable thing about the 4.8 generation is that hallucination is markedly lower, which bears directly on how much effort verification takes. Kiro itself corroborates this: in its model notes it singles out Claude Opus 4.8, saying it is "four times less likely than the previous generation to let a defect through in generated code," and lists it as the option with "highest reliability."
What's trimmed is the number of rework cycles, not verification itself. Generation is not evaluation, and acceptance should come from an independent agent rather than the executor judging itself — this division of labor and check-and-balance cannot be removed (see Floor 4). A more reliable model saves the cost of "fixing it over and over," not the judgment of "whether to check."
Kiro runs on a credit multiplier system: each model carries a multiplier, and the more expensive the model and the deeper the reasoning effort, the more credits it consumes. Understanding this mechanism is how you spend cost where it matters.
| Model | Credit multiplier (approx.) | Meaning |
|---|---|---|
| Auto router | baseline 1.0x | When unsure, let it choose; it handles the expensive/cheap allocation for you |
| Claude Haiku | ≈ 0.4x | Simple queries, running scripts, reading logs |
| Claude Sonnet family | ≈ 1.3x | The workhorse for everyday agentic work: interactive development, iteration, queries, tool calls — the best value balance |
| Claude Opus family | ≈ 2.2x | The primary model for coding: critical code, complex reasoning, changes where regression is costly |
| Some open-weight models | as low as 0.05x | Extremely cost-sensitive, low-reliability batch work |
Three things follow directly from this mechanism: ① Assign models by role — the Opus family as the primary model for coding (critical code, complex reasoning), Sonnet as the workhorse for everyday agentic work (most interactive development), Haiku and open-weight models for the cheapest lightweight tasks, and Auto when you're unsure; ② effort is another cost variable — Max is noticeably more expensive than Low on the same model, so don't reflexively set Max for simple tasks; ③ split large tasks — use Sonnet for research and switch to Opus for critical code, rather than running on the strongest model the whole way.
On the inference backend: the docs only confirm that Kiro's inference runs in AWS regions (us-east-1, eu-central-1) using cross-region inference; they don't state "Amazon Bedrock" verbatim. Externally, say "AWS-hosted, cross-region inference," and don't present a reasonable inference as something the docs state explicitly.
Context: the only resource you truly manage
The loop is ready-made; what you manage is what content goes in and when to clean it up — and this floor largely determines how far Kiro can go. Note that the three ways of using it — IDE, CLI, ACP — diverge sharply on this floor: the CLI has a full set of slash commands (/compact, /rewind, /tangent), the IDE uses panel buttons and # references, and ACP carries the CLI's capabilities into third-party editors over a protocol (detailed in the next section).
After the model completes a step, you have several choices. The easiest is "continue," but the other options exist precisely to help you manage context quality.
First, clear up something easy to confuse. Kiro has three ways to use it: IDE, CLI, ACP. ACP is not a third standalone product but a mode of the CLI: you run kiro-cli acp, and underneath it's still the CLI's agent core — it just communicates over an open protocol (JSON-RPC), so you can call Kiro from the AI panels of third-party editors like JetBrains, Zed, and Neovim. So ACP's capabilities are essentially a subset of the CLI's: terminal-only interactive commands (rewind's turn selector, tangent's toggle) aren't available, and the IDE's GUI-only features (the checkpoint panel, the Spec panel) certainly aren't.
The table below lays out, for each context action, whether it exists in each of the three modes, how to use it, and when. A few names are especially easy to misread (rewind doesn't revert files, clear only clears the display), and those are marked:
| Action | IDE | CLI | ACP | When to use / note |
|---|---|---|---|---|
| Continue same session | ✓ panel | ✓ send message | ✓ session/prompt | Continue while the context is still relevant; everything in the window is in play |
| Compact context | ✓ Summarization | ✓ /compact | ✓* compaction | When the session is full of stale debugging info, compact manually with instructions; don't wait for auto-compact to fire when the model is at its dullest |
| Open a brand-new conversation (start fresh) | ✓ new chat | ✓ /chat new | ✓ session/new | Use this to start a brand-new task. On CLI don't use /clear — it only clears the display, not the context, an easy point of confusion |
| Roll back the "conversation" (fork) | ✗ no rewind | ✓ /rewind | ✗ | When you want to drop a failed attempt but keep the file-reading results. Reverts only the conversation, not files; disk changes remain |
| Roll back "code + context" | ✓ checkpoint Restore | △ experimental (shadow git) | ✗ no GUI | When files are in a mess and you need to revert the code too; rewind can't touch files. checkpoint is not a replacement for git |
| Tangent exploration | ✗ | ✓ /tangent | ✗ | When exploring a side branch without polluting the main line; Ctrl+T to toggle |
| Switch model | ✓ selector | ✓ /model | ✓ session/set_model | Choose by task difficulty; save the expensive models for critical coding (see 0.4) |
| Switch agent config | ✓ | ✓ /agent swap | ✓ session/set_mode | Swap in a different agent's tool set and default model |
| Delegate to a Subagent | ✓ | ✓ up to 4 in parallel | ✓ _session/terminate | When the next step produces a lot of output you only need the conclusion of, keeping the intermediate noise in an isolated context (see Floor 3) |
| Add a file to context | ✓ # reference | ✓ @path / /context add | ✓* via command extension | When this session needs to reference a particular file temporarily |
| Large codebase / mass-document retrieval | — | ✓ /knowledge (experimental) | — | When content exceeds ~10MB or thousands of files; retrieved on demand, doesn't occupy resident context (see 1.7) |
| Image input | ✓ | ✓ | ✓ image capability | Paste a screenshot or design mockup for the model to look at |
✓* = a Kiro extension to ACP, experimental, and only effective if the host editor implements the _kiro.dev/ extensions; without it, it degrades to plain standard ACP (only sessions, models, streaming). ✗ the mode lacks this action, — not listed as supported by the docs, △ experimental, must be enabled manually.
The IDE side has no slash commands overall; the equivalents are panel buttons: Summarization (the counterpart to /compact), the checkpoint panel's Restore (roll back code + context), Revert (undo only the most recent turn's changes), and # context references.
Compaction is fundamentally lossy compression: as you near the window limit, the whole conversation is summarized into a more concise description, and the model continues from the summary in a new context. CLI triggers it manually with /compact, and it also fires automatically when context overflows; the two tunable settings are keeping the most recent 2 message pairs by default and keeping 2% of the context window by default, taking whichever is more conservative. A Kiro-specific detail: after a compact, a new session is created rather than the original being compressed in place, and you can recover the original with /chat resume. The IDE equivalent is the Summarization feature, with the same logic.
The first thing to watch: at the moment auto-compact fires, the model happens to be at its least intelligent because of context rot, yet it has to make the most critical "keep what, discard what" decision. A typical failure chain: after a long debug session, auto-compact fires and the summary focuses on the debugging; then you say "now go fix that warning in the other file," but the warning's details have already been dropped. The right approach is proactive management: compact manually with /compact ahead of time, while you know your next step and the model is still in good shape.
CLI /clear only clears the screen display, not the saved conversation context. The docs are explicit on this: it clears the display only, not the saved conversation. A common misuse is assuming /clear starts a new conversation, when the context hasn't actually been cleared. In Kiro, the real way to start a new conversation is /chat new — note this distinction.
Kiro has an easily-confused design around "rolling back": it splits "roll back the conversation" and "roll back files" into two independent mechanisms, and one command can't do both at once.
Rolls back the "conversation"
Forks the session into a new branch at some earlier turn; the selector shows each turn's prompt preview and the context usage percentage at that point.
But the docs are clear that it does not roll back file changes: files changed by later turns stay as-is on disk, and reverting the files too requires version control.
Rolls back "code + context"
A checkpoint is created automatically with each prompt; clicking Restore rolls both the codebase and Kiro's context back to that point.
CLI also has checkpoints, but it's an experimental feature, implemented with a shadow git repo, must be enabled manually (chat.enableCheckpoint true), and is cleared when the session ends — it doesn't persist across sessions.
It only tracks files the Kiro agent itself changed with built-in tools. Files you edited manually, that a formatter touched, or that an MCP tool or bash command changed are not tracked, and may be overwritten and lost on Restore. So the docs repeatedly stress: checkpoint is not a replacement for git; always use it alongside version control.
Two needs, two tools: to drop just the conversational noise of "trying approach A" while keeping the file-reading results, use /rewind; if approach A has already left the files in a mess, use checkpoint Restore (or git directly) to revert the files too.
The CLI docs explicitly split "supplying context to the conversation" into four categories, chosen by "whether it occupies resident tokens":
| Method | Occupies resident context? | Suited for |
|---|---|---|
| Agent Resources | Yes, occupies tokens on every request | Project rules that stay resident across sessions |
| Skills (skill://) | Metadata resident, body loaded on demand | Reusable specialized workflows |
| Session Context (/context add) | Yes, current session only | Files this session needs to reference temporarily |
| Knowledge Bases (/knowledge) | No, occupies context only when searched | Large codebases, mass documents |
A rule of thumb: new task = new session. Having just finished implementing a feature and about to write its docs, you should consider starting a new session (CLI uses /chat new — again, not /clear). There's a trade-off between related tasks: reusing context is faster and cheaper, since the model doesn't have to re-read the files you just changed; a new session is cleaner but has to re-read the relevant files, which is slower and costs more credits. The decision rule: if the next task is highly related to the current context, continue; otherwise start a new session.
CLI /knowledge (experimental) can turn a large file, an entire document library, or even a whole codebase into a semantic-search knowledge base: it occupies context only when a search hits it, and consumes no tokens otherwise. The docs' threshold is clear: once content exceeds about 10MB or thousands of files, it shouldn't go into resident context — use a Knowledge Base instead. It's an optional persistent semantic index layer that, in large-codebase scenarios, saves a lot of "have the agent read files one by one" overhead. The trade-offs: it's experimental with limited stability, plus the indexing performance bottleneck on very large codebases (see 2.3).
Knowledge & tools: the blueprint
This floor is the guide's core framework. An often-overlooked premise: the harness matters as much as, or more than, the model itself. Teams fixate on benchmarks, but in actual use, results depend more on the ecosystem you build around the model. This floor has one thing going for it: most of Kiro's Steering / Skills / Hooks / MCP is shared between IDE and CLI — configure it once and both sides can use it.
A common misconception is that an agent finds code purely by traversing files live, with no index. Kiro doesn't work that way — it's a hybrid of three coexisting mechanisms, which is the easiest thing to get wrong by assumption:
Worth emphasizing: symbol-level and structure-level code understanding (which in many tools requires manually installing an LSP) is something Kiro has built into the product (the CLI's tree-sitter code tool, the IDE's index and AST editing). So in Kiro this isn't scaffolding "you have to build" but an existing capability you just need to "understand is already there."
Building the harness has an order; each layer rests on the previous one, and you shouldn't skip. Kiro's version is Steering → Hooks → Skills → Powers → code intelligence → MCP → Subagents. Each layer comes with its "common mistake" and "which line it belongs to":
.kiro/steering/, supporting several inclusion modes. Principle: only put "broadly applicable" content here; too much drags down performance. Common mistake: writing reusable specialized knowledge into it (that kind of content belongs in a skill). See 2.4..kiro/settings/mcp.json, with enterprise governance (MCP Registry) and one-click install links. Common mistake: wiring up MCP before the basic configuration is ready. See 2.7.The core constraint: a model's capability ceiling in a large codebase equals its ability to find the right context. A few recurring practices: keep Steering lean and mode-segmented (don't set everything to always-load); scope test and lint by subdirectory; exclude generated files with .gitignore and protected paths; when the structure is unclear, have Kiro generate a codebase map (IDE uses #repository, and for an existing codebase you can first generate structure.md / tech.md / product.md); have the model use tree-sitter symbol search rather than pure string grep; and use a Knowledge Base for large codebases rather than loading everything.
Kiro's embedding index currently has a performance bottleneck on very large codebases. The community has reported that on codebases at the fifty-thousand-file scale with years of git history, indexing struggles to complete and CPU usage runs high; this feedback is tracked as an open improvement request on Kiro's GitHub. Small-to-medium codebases (a few thousand files) index fine, while on very large monorepos you can expect "indexing is slower, and pairs best with a Knowledge Base and per-module scoping". Pick the strategy to match the scale.
Steering files live in .kiro/steering/ (workspace level) or ~/.kiro/steering/ (global level); on conflict, the workspace wins. Each .md loads automatically, split by domain. In the IDE, clicking "Generate Steering Docs" produces three base files — product / tech / structure — on which Kiro's basic understanding of the project is built. Steering is more flexible than a single project doc loaded in full from start to finish, and that flexibility comes from these four inclusion modes (written in the YAML front-matter at the very top of the file):
| Mode | Syntax | Behavior |
|---|---|---|
| always (default) | this is the default if omitted | Loaded on every interaction |
| fileMatch | inclusion: fileMatch + fileMatchPattern | Loaded only when working on files matching that glob |
| auto | inclusion: auto + required name/description | Loaded automatically by semantic match of the description to the request, same mechanism as Skills |
| manual | inclusion: manual | Not loaded automatically; reference it explicitly in chat with #filename |
fileMatch is the key: Kiro's steering files are flat (all in one directory), and when they load is decided by the inclusion mode, not by which directory the file sits in. To get "rules specific to a certain subdirectory," write a fileMatch glob that matches those files, rather than placing the rule file inside that subdirectory. Additional notes: #[[file:relative-path]] links a live workspace file into steering (some reports say this syntax occasionally doesn't work on the CLI but is fine in the IDE); AGENTS.md is supported but doesn't support inclusion modes and always loads in full; and on the CLI, when using a custom agent, steering doesn't load automatically — you must add it explicitly in the agent config's resources field.
Kiro's Skills use exactly the open Agent Skills standard, so mature community skill know-how and ready-made skill packages are largely reusable here as-is. The structure is identical:
Locations: workspace .kiro/skills/, global ~/.kiro/skills/, with the workspace winning on name conflict. Both name (must match the folder name) and description (used by Kiro for semantic matching) are required. Progressive disclosure works the same way: at startup only name+description load, and the full SKILL.md loads only when a request matches. The familiar best practices apply directly: don't pile on filler, add a Gotchas section to record failure points, use the file system for progressive disclosure, and write the description as trigger conditions rather than a summary.
When Kiro activates a skill, it injects the SKILL.md body as raw text, without the skill's directory path. The result: for a skill placed in the global directory (~/.kiro/skills/), the relative references written in its SKILL.md (references/xxx.md) may fail to resolve, while skills inside the workspace are fine (this is an open issue the community tracks on Kiro's GitHub). Until it's addressed, the "thin SKILL.md + references loaded on demand" pattern is less reliable for global skills, so the simplest path is to put the skill in the workspace, or write as much content as possible into the SKILL.md body.
This is the layer where IDE and CLI diverge the most, so they have to be covered separately.
Rich events, including file system & spec
Placed in .kiro/hooks/, each a *.kiro.hook JSON file, committable to git for team sharing.
- Prompt Submit / Agent Stop
- Pre / Post Tool Use (before/after a tool call, can block)
- File Create / Save / Delete (file-system events)
- Pre / Post Task Execution (before/after a spec task)
- Manual Trigger
Two action types: Ask Kiro (sends a prompt, starts a new loop, consumes credits) and Run Command (runs a shell command, no credit cost, faster, and a non-zero exit code can block at the Pre Tool Use stage).
A different, leaner set
Written in the hooks field of the agent's JSON config, with events limited to:
agentSpawn(agent initialization)userPromptSubmitpreToolUse/postToolUsestop
No file events and no spec-task events (those are IDE-only).
One gap: lifecycle events like session end, before compaction, and subagent completion have no direct counterpart on either line right now.
MCP integration is configured in .kiro/settings/mcp.json (workspace level) or at the user level; the two merge, with the workspace winning. It's the standard mcpServers format — local servers use command/args/env, remote ones use url/headers. A few things Kiro adds here: finer-grained auto-approval (autoApprove / disabledTools / disabled); the MCP Registry (CLI, where an organization centrally pushes servers, and registry defaults can be overridden locally key by key — suited to enterprise control); and one-click install links ("Add to Kiro" placed in a README). One thing to note: the Web Kiro Web sandbox currently supports only local stdio servers, not remote ones.
Cost-reduction mechanism: Kiro has Tool Search, which, when there are many tools, doesn't preload all tool definitions but searches and pulls them at runtime on demand. It's off by default on the CLI; the trigger threshold is 5% of context or 50,000 tokens, and turning it on is recommended once you have more than 5 MCP servers. Powers, meanwhile, bundles "MCP tools + knowledge + workflows" into one package activated dynamically by context: MCP provides the tools, Skills provide the operational knowledge, and Powers provide the combined "tools plus knowledge" volume.
One principle runs through this floor: more tools and more instructions are not better. Tool schemas and overly long instructions both crowd out the space the model uses for reasoning. A short steering doc and a focused skill often beat piling on redundant configuration. Because Kiro bills by credit, this also has a direct cost implication: every extra piece of resident context is something you pay for on every request. The default leaning should be "verify with the minimum configuration whether the model can handle it, then add incrementally based on the problems you actually hit." This undercurrent is tied off in the closing section, "build it up, then cut it back."
Orchestration: who should hold the plan
At this floor the question becomes: for a given task, do you let the model keep the whole plan in its head turn by turn, or hand the plan off to be held elsewhere? Orchestration comes down to one question: who holds the plan. The most naive approach is to have the model keep the plan in context and proceed turn by turn, but once a task grows large, the plan tends to drift and distort under compaction.
Kiro's answer is distinctive: persist the plan to disk as spec files, have a human review them, then let the agent implement against them. The plan is no longer held in the model's context turn by turn but becomes three markdown files on disk that can be version-controlled and reviewed by the team. This is Kiro's signature capability, Spec-driven development, and it is IDE-exclusive (the CLI has no built-in spec, only a lightweight Plan agent, see 3.7). What this buys you is that the plan is readable, reviewable, and traceable: think it through at the document level and have a human confirm it before writing any code — especially suited to complex, high-regression-cost development that needs team alignment.
Each spec generates three files, placed under .kiro/specs/<feature>/, one directory per feature:
WHEN [condition] THE SYSTEM SHALL [behavior]). Clear, testable, traceable.Two workflow variants plus a quick mode: Requirements-First (variant; behavior is clear, architecture is flexible) · Design-First (variant; an architecture already exists, or you must meet hard non-functional requirements like latency/compliance) · Quick Plan (a separate session mode, not strictly a variant: auto-generate all three files at once, skipping the inter-stage approvals — suited to mature features where you trust Kiro's output). The default has approval gates: Requirements → (human review) → Design → (human review) → Tasks, and before entering design there's an Analyze Requirements step that helps you catch contradictions, ambiguities, and gaps in the requirements. Note the division of labor: a spec covers only a single feature's plan; project-level resident context goes through steering, and the two are kept separate.
tasks.md has a dedicated execution view that shows each task's status in real time. Run all Tasks: Kiro automatically builds a task dependency graph and groups the dependency-free tasks into "waves" — the first wave runs all dependency-free tasks concurrently, the second runs those whose dependencies the first wave satisfied, and so on — serial between waves, concurrent within a wave. Sync Files: when triggered, has Kiro scan the codebase and automatically mark completed tasks, handling the case where "you or a teammate already did part of it in another session." Beyond running everything at once, you can also execute tasks one at a time (the Start Task link above a task), reviewing each before moving to the next — handy if you like "one task, one review."
The structured three-document flow
Use it for complex features, team collaboration, and high-regression-cost changes. Choose it when you've thought it through and want to ship, leave a record, and align the team.
Improvised, conversational development
Use it for quick exploration, prototyping, and when the goal isn't yet clear. Spec is Kiro's "strong default" leaning, but not a hard constraint — Vibe is available anytime.
Subagents exist in both the IDE and the CLI; only the granularity of control differs. They are a different thing from the wave-based parallelism of spec tasks (3.3): that is parallelism at the task-orchestration layer, while a subagent spins up a child agent with its own isolated context window to run a stretch on its own. In common: the subagent gets its own fresh window, the main agent decides whether to delegate, and when it finishes the result rolls back to the main agent, keeping the intermediate noise in the isolated context. The two lines differ as follows:
.md files in ~/.kiro/agents (global) or .kiro/agents (project) — YAML frontmatter sets name / description / tools / model — and they also surface as slash commands. Note that IDE subagents cannot access Spec, and Hooks do not fire inside them.subagent tool with finer control. The CLI ships a built-in subagent tool (the early experimental delegate is deprecated and superseded by it), exposing a few more controls than the IDE: up to 4 in parallel (a hard limit to keep in mind when planning parallelism); calling a custom agent by name ("Use the backend agent to refactor…"; the orchestrating agent's tools must include subagent to delegate, and it uses the built-in summary tool to roll results up); Ctrl+G opens the execution monitor to see each subagent's live status; and permissions via availableAgents (restrict which can be dispatched, glob-supported) and trustedAgents (run without approval).Where it fits (both lines): research-intensive tasks (read dozens of files and return a summary), multiple mutually independent subtasks (run in parallel), needing a fresh perspective (not inheriting the main session's assumptions), and an independent review before committing. The mental test: "Do I need these tool outputs themselves, or only the conclusion?" If only the conclusion, delegate.
One sentence to frame the boundary of Kiro's orchestration and avoid mismatched expectations. Kiro's orchestration is agent-driven and described in natural language; the holder of the plan is always the LLM main agent, not a deterministic script. Two concrete manifestations:
max_iterations capped at 10 rounds) is likewise driven by you describing intent in natural language and the agent deciding how — there's no hand-writable script or schema.So Kiro's strength is: use a spec to persist a single feature's plan and get it reviewed clearly, then use subagents for limited parallel delegation and adversarial review. Understand this boundary and you won't expect it, when configuring, to do script-level, line-by-line replayable, large-scale deterministic orchestration.
A lightweight compensation when there's no spec
Built into the CLI from 1.23 (Shift+Tab or /plan): structured questioning → read-only codebase exploration → produce a task breakdown → hand the approved plan off to an executing agent. But the output is a temporary plan, not the three persisted Spec files — don't conflate the two.
An example pattern, not an out-of-the-box feature
Kiro has no built-in product called "Agent Teams." The official sample repo demonstrates assembling a multi-agent team out of subagent + steering + prompts + hooks (an architect dispatches tasks to coder/ops, then hands off to a reviewer and a security-reviewer), but that's an example pattern you assemble yourself.
Pro/Pro+/Power subscriptions only, off by default. This is a different form closer to a "frontier agent," not the CLI subagent: it clones the repo in an isolated sandbox, runs up to 10 tasks concurrently, internally coordinates a few specialized subagents (research/planning / writing code / verification), has persistent memory across tasks and repos, can learn team conventions from PR review feedback, and can be dispatched a task from a GitHub issue via a /kiro comment. In this mode you can't choose the model (the agent chooses automatically).
Verification: the executor should not judge its own output
Writing code is now cheap, and the bottleneck has moved to verification accordingly. This is the most overlooked yet most critical layer when building the scaffolding.
The core is one sentence: have the agent that executed a task evaluate its own output, and it will almost always confidently give it a good grade even when the quality is mediocre — especially on tasks without a binary pass/fail standard. This is no different from having a developer review their own code. The fix is to separate generation from evaluation, the way development and QA are different people on a traditional team. A key engineering reality: tuning an independent reviewing agent to be strict and exacting is far easier than getting the executing agent to learn self-criticism. Code review, independent QA, separation of audit — the human world has long used division of labor as a check and balance.
Kiro builds this division of labor into the product, so you don't have to build it from scratch:
Kiro IDE defaults to Autopilot, where the agent works autonomously end to end, changing code across files, running commands, and making architectural decisions without pausing to ask at each step. This is a deliberate choice on Kiro's part: favor letting the agent keep momentum, and put the control point after the fact, where you review all the diffs, roll back the whole thing, or interrupt mid-execution. If you prefer "step-by-step confirmation," just switch to Supervised (it stops after each turn that contains file edits to wait for your approval, with changes accepted/rejected hunk by hunk). Neither cadence is better in the abstract; pick by task and personal habit.
This is a point the Kiro docs call out specifically: Supervised is not a sandbox, not an isolation boundary, not access control — the agent's underlying permissions are exactly the same in both modes. It only determines "whether to show you the diff before changes hit disk." The real security boundary is a different set of mechanisms: protected paths, trusted commands, workspace isolation, credential scoping, where "writing a protected path requires approval" is enforced in both modes. The CLI side has a different permission model: use /tools trust / untrust / trust-all to control which tools skip confirmation.
So when configuring verification, think of two things separately: Autopilot / Supervised governs the review cadence (whether to look hunk by hunk); protected paths / trusted commands are what govern security (whether it can touch something). Don't treat Supervised as a security boundary.
Kiro also has a fairly distinctive built-in verification capability: Property-Based Testing. It automatically extracts "properties" from EARS-format requirements, generates hundreds or thousands of random test cases, and uses shrinking to find the minimal counterexample (akin to a built-in red team). When a test fails, Kiro gives you three options: "change the implementation / change the test / change the requirement." This connects neatly to the spec's traceability: requirements → tasks → property tests form one chain. With this step, Kiro wires "formalized requirements" into "automated verification"; it's off by default and optionally enabled.
Onboard the agent like a new hire
Combine the preceding four floors in a real campaign, using only Kiro's ready-made capabilities throughout, with no self-built agent. What's covered here is the methodology; the methodology is independent of which tool you use, and below it's mapped point by point onto Kiro's mechanisms, with the real-world constraints stated honestly. A large legacy codebase can't be handed to the agent all at once; the right approach is a five-step loop:
Each time a task finishes, write the lessons back, and the next round of the agent starts from a higher point. You wouldn't put a new hire in front of hundreds of thousands of lines of code on day one; the same goes for the agent.
Persist the project's common knowledge, conventions, and gotchas into steering, and maintain it as an engineering artifact independent of code branches as much as possible (if context is isolated by branch, the agents on different branches become "different people"). Index masses of test logs, docs, and historical code semantically with a Knowledge Base, retrieved on demand rather than loaded in full.
Carry specialized workflows in Skills, keeping them lightweight by "pointing to a central knowledge base rather than copying content"; in particular, build a debugging skill that sets "investigating a bug must start with root-cause analysis, no blind trial-and-error" as a hard constraint.
For large changes, first use a spec to think the plan through and get a human review before acting (the approval gates are precisely the structural safeguard for "don't dive in blindly across hundreds of thousands of lines"); use subagents when you need parallel exploration (remember the limit of 4).
Use a review loop / independent reviewer as the gate, lock down critical modules with protected paths, and don't let Autopilot touch core code without protection in place.
Kiro's embedding index currently has a performance bottleneck on very large codebases (tens of thousands of files, with years of git history), where indexing may be slow or struggle to complete. So on a very large monorepo, don't assume #codebase can cover the whole codebase indiscriminately — pairing it with a Knowledge Base + per-module scoping is the steadier route. This is actually consistent with the "onboard the agent like a new hire, feed it in slices" methodology: you shouldn't load the entire codebase at once in the first place.
Four key points (a summary tying the four floors together): scope isolation is the prerequisite — a large codebase must be fed in slices and expanded gradually; the debugging skill must be set as a hard constraint, since blindly changing one line can trigger a cascade of failures; context should be kept independent of code branches, or the agents on different branches amount to different individuals; and don't put faith in full indexing on a large codebase — enable a Knowledge Base when needed. This is not an out-of-the-box, do-everything solution; it needs someone to continuously maintain the context layer and advance incrementally.
Build it up, then cut it back: the harness's half-life
A harness isn't built once and done for good; it has a half-life. Once the model gets stronger, you should proactively cut scaffolding. How much you can cut depends on how strong the model is. This principle holds up: every piece of the harness bets on an assumption about "what the model can do," and assumptions expire. A patch made for some flaw in an old model becomes pure dead weight once the model upgrades and that flaw disappears, actually slowing things down. So the right mindset isn't "how do I make Kiro stronger" but "what can I stop doing."
- The verification harness: as the model grows more reliable, review-loop rounds and Supervised's hunk-by-hunk review can both be relaxed appropriately
- Density of spec approval gates: use Quick Plan to skip gates for simple, mature features
- Specific configuration tuned for a particular version: Kiro iterates fast, and these expire
- Separating generation from evaluation: without an independent evaluator, bugs on edge-case tasks still slip through
- Requirements analysis and quality gating: complex, high-risk work still goes through human review
- Context discipline: lean steering, context independent of branches
Three concrete takeaways: ① as the model grows more reliable, the verification scaffolding should thin out — but the division-of-labor check itself can't be removed; ② tune the density of spec approval gates to the model's capability, but don't turn them off entirely; ③ Kiro itself iterates extremely fast, so configuration you tuned for a particular version will also expire — treat configuration as something to maintain continuously and review periodically.
It comes down to one line: clearly distinguish the cuttable part (scaffolding) from the un-cuttable part (context discipline, division-of-labor verification). The space of valuable harness combinations doesn't shrink as the model advances; it just moves. The best agent framework is the one you can delete. The advice: every 3 to 6 months, or whenever you feel performance stagnating after a major model release, run a full harness audit and proactively delete assumptions that are no longer necessary.