How the Feature Doc Generator Works as a StudioX Mission
I build Missions for a living, so when the product team asked for a Feature Doc Generator, my first question wasn't "what should it write?" It was "what does it read, and how do we watch it reason?" Documentation that generates itself is only trustworthy if you can see exactly how it got from a commit to a claim. That's what a StudioX Mission gives you, and it's why we built this as a Mission rather than a one-shot prompt.
Let me walk through the architecture the way I'd whiteboard it for a new engineer.
A Mission is a small org chart, not a script
The Feature Doc Generator is a Mission: a self-contained agentic system that takes a plain-language goal — "document the payment-retry feature that landed this sprint" — and returns a finished artifact plus a full reasoning trace. It is not a fixed pipeline. It's a roster of specialist agents and a router that decides, one round at a time, who acts next.
There are two tiers, and keeping them straight is the whole game.
Tier 1 — the Reasoning Core (the router). It reads the request and the roster of agents and picks a single agent to act next. It runs several rounds, accumulating each agent's result, and after each round it re-reads everything gathered so far and decides — semantically, not by keyword — whether the request is fully answered. When it is, the router says "done" and we move to synthesis. The Core is the project manager: it plans and synthesizes, but it does no tool work itself.
Tier 2 — the agent planner. When the router selects an agent that's backed by a StudioX bot, the planner discovers that agent's capabilities (its MCP tools, knowledge bases, and vibes), decomposes the goal into an ordered list of steps, and executes each one. Most steps run on the agent's own bot; the final reason step — the one that produces the user-facing text — runs on IRIS's own model to synthesize and format. Every plan must end with that reason step.
Here's the roster we register for this Mission:
- Code Agent — reads the repository through an MCP server (GitHub/GitLab). It pulls the implementation for the feature under documentation. This is strictly read-only: it fetches files, diffs, and symbols. Nothing it does mutates the repo.
- History Agent — reads commit history and pull-request discussion for the same feature. Also read-only. This is where the why lives — the decisions, the rejected approaches, the edge cases argued out in review.
- Test Agent — reads the test suite and maps requirements to what's actually verified, so the test matrix reflects reality rather than intent.
- Report Agent — takes everything the other agents returned and produces the deliverable: the PRD, the spec sheet, the test matrix, or the customer PDF. Report Agent output is persisted as a first-class
iris_reportsrecord, so the artifact is durable and versioned, not just a chat message.
Observations: watching it reason, not just trusting the output
The part I care most about is observability. Every phase of the Mission emits a trace event — the routing decision and its reasoning, the planner's step list, each step's output, and the validation verdict for that step. These stream live over Server-Sent Events to the Explain rail in true execution order. You don't get a document and a shrug. You watch the Code Agent fetch the retry handler, the History Agent surface the PR where the backoff ceiling was debated, the Test Agent map three requirements to five test cases — and you see the plain-language explanation for why each of those happened.
That matters for documentation specifically. When the PRD says "retries cap at five attempts," you can trace that claim back through the observation stream to the exact commit and the exact test it came from. A generated doc you can't audit is worse than no doc. The trace is what makes this one trustworthy.
The planner is also honest about validation. After each step runs, a separate model call checks whether the step actually did its job — and it's deliberately lenient: a legitimately empty result passes, raw unformatted data passes (we format later). It fails only when a step refused, errored, returned the wrong kind of content, or echoed a previous step. On failure we retry once with a corrective note. So a flaky repo read doesn't silently poison the document.
The decision queue: read is free, write is gated
Here's where I want to be precise, because it's easy to oversell autonomy. Reading the code, the history, and the tests is entirely read-only — those agents pull data through MCP servers and never mutate anything. There is no approval gate on reading, because there's nothing to approve.
The gate is on the write. When the Mission wants to do something with real blast radius — commit the generated docs back to the repo, or publish a customer-facing PDF externally — the synthesis step emits a [REQUEST_APPROVAL] block instead of claiming it did the thing. The chat route turns that block into a row in the decision queue, emails each reviewer a magic-link approve/reject URL, and rewrites the message the user sees into an "awaiting approval" status. A human clicks approve; only then does the publish happen. That's the human-in-the-loop boundary, and it sits exactly where irreversibility begins — not one step earlier.
Instant MCP servers: how the tools get wired
None of the read agents would work without their tools, and I don't want to run an integration project every time a customer uses a different code host. That's what instant MCP servers are for. We register a GitHub or GitLab MCP server, and the agents discover the available tools at runtime — no code change, no redeployment. Point the Generic Agent at a new MCP server tomorrow and the Mission can use it immediately. Registering a tool, not shipping a release, is what "no engineering" means here.
The portal is the surface people actually touch: they kick off the Mission, watch the observation stream, review the drafted artifact, and re-run it when the code moves. For the business case behind all of this, Harry's why it matters is the read; for what it looks like on a real team, see in practice. If you want the broader picture of composing agents into shipped business applications, or how we think about autonomous AI workers generally, start there.
Discussion
No comments yet — start the conversation.