How the Integration Risk Scorer Works: A StudioX Mission

When Ajay describes the Integration Risk Scorer as forecasting, an architect's first question is the honest one: forecasting how, mechanically, without hand-waving? I own the reference architecture for our Missions, so let me open the hood. The Scorer is a StudioX Mission — a small, ordered team of specialist agents driven by a reasoning controller — and everything it does is observable while it does it. No black box. That constraint shapes every design choice below.

The shape of the mission

A Mission is not one model with a big prompt. It is a controller loop over a private working memory — we call it the notepad — that decides one scoped action at a time, files each result, and re-plans from what the result actually says. The reasoning layer is the project manager; each agent is a worker that owns exactly one capability: one tool call, one knowledge-base query, or one vibe. Agents do not compose, so the controller decomposes the goal into single-capability asks. This is deliberate, and it is why the Scorer stays reliable on real, messy history instead of hallucinating resemblances.

For the Integration Risk Scorer, the roster is small and purpose-built:

Change Analysis Agent — reads the proposed change (diff, touched files, the config and interface seams it crosses) through a source-control MCP server. Read-only.
Escape History Agent — a bot backed by its own knowledge base: your post-mortems, defect records, and escape log. It answers one question well — which past escapes resemble this seam? Read-only, and knowledge-isolated: it searches escapes and nothing else.
Coverage Agent — checks the planned test suite against the risky seam via a test-management MCP server (whatever you run — TestRail, Xray, an internal system). Read-only.
Generic Agent — no fixed domain. It discovers whatever MCP tools are registered at runtime and uses them immediately, so wiring a new tracker never means a code change.

One iteration at a time, and why the order matters

Here is the mechanic that makes the forecast trustworthy: the controller authors each agent's request from the previous finding, and ships only a tightly scoped ask — never its whole context. Reliability lives in the correctness of the question, not in second-guessing the answer.

Walk the loop for a real change:

Iteration 1 — discover + invoke Change Analysis. The controller asks, in effect, "identify the components and integration seams this diff crosses, with the exact interface and config touchpoints." The agent runs its one source-control capability and files a small, precise finding: this change touches the billing↔payments retry-and-idempotency seam.
Iteration 2 — invoke Escape History. Now the controller uses that seam to author the next question. It does not paste the diff in. It asks the Escape History Agent for escapes whose failure signature involves that specific seam. The finding comes back: escape 2024-1183, a retry/idempotency test-escape that cost two cycles, plus two lower-similarity neighbors.
Iteration 3 — invoke Coverage. The controller now knows the exact hazard, so it scopes a coverage question: does the planned suite exercise the idempotency-under-retry path that 2024-1183 slipped through? The finding: it does not.
Render. The controller holds all three findings in the notepad and composes the verdict itself — a risk forecast with the resemblance named, the evidence cited, and the coverage gap called out.

Because every ask is scoped, each agent returns a handful of relevant rows, not a raw dump — so nothing truncates, and the reasoning stays grounded in real records. Critically, the controller re-plans on what a finding contains, not on how complete it looks. A grounded "no resembling escape found" is a valid, final answer — a low-risk forecast — and the mission never manufactures a resemblance to fill space.

scoped ask ▼ finding ▲

Change Analysis source-control MCP read-only Escape History bot + KB (post-mortems) read-only Coverage test-mgmt MCP read-only Generic Agent discovers MCP tools at runtime Risk verdict "resembles 2024-1183" Portal observable report Decision queue only if it recommends a state change

Every arrow above streams to the Explain rail as an observation while the mission runs.

Observations: watching it reason, not just trusting it

Every decision the controller makes is emitted as a trace event over Server-Sent Events and rendered on the Explain rail in true execution order. You see "Selecting Escape History Agent…", then the exact scoped ask it sent, then "Completed in 1.2s — matched 2024-1183 (0.86 similarity)." This is not logging after the fact. It is the mission narrating itself live. When a lead disagrees with a forecast, they don't file a bug against a black box — they open the trace, see which finding drove the verdict, and correct the knowledge or the agent description that produced it. That is what makes an autonomous forecast auditable enough to trust.

Where a human signs off — and where one doesn't

I want to be scrupulous here, because it is easy to oversell human-in-the-loop. The scan itself is entirely read-only. Reading a diff, querying the escape KB, checking coverage — none of it mutates your systems, so none of it needs an approval gate. The verdict lands in a portal, an observable surface where anyone can inspect it.

The decision queue only engages when the mission recommends an action that changes state — hold the merge, or add a required regression suite to the gate. Then, and only then, the mission emits a [REQUEST_APPROVAL] block instead of acting: a row appears in the Decision Queue, the reviewer gets a magic-link approve/reject, and nothing moves until a human says so. For destructive or high-blast-radius actions the mission asks; for reading history it simply reports.

Instant MCP, and why wiring is not a project

The reason this ships in hours instead of a quarter: the tools the agents use — source control, the escape tracker, test management — register as MCP servers, and the Generic Agent discovers them at runtime. Swap your tracker, register the new one, and the mission uses it on the next run with no code change and no redeploy. The escape history lives in a knowledge base you own, inside your own perimeter, on the enterprise AI platform.

If you want the business case for why forecasting beats gut feel, Ajay makes it in why it matters. And to see the loop above play out on a real change with real numbers, read Patrick's in practice.

How the Integration Risk Scorer Works: A StudioX Mission

The shape of the mission

One iteration at a time, and why the order matters

Observations: watching it reason, not just trusting it

Where a human signs off — and where one doesn't

Instant MCP, and why wiring is not a project

Discussion

Join the discussion

See StudioX run.