How the Deployment Risk Scorer Works: A StudioX Mission
When Ajay describes the Deployment Risk Scorer, he talks about the gap between what your organization knows and what the person shipping actually sees. My job is to explain how we close that gap mechanically — because "an AI looks at your change and scores it" is the kind of sentence that should make any architect suspicious. So let me be precise about what runs, in what order, and where a human stays in control. Everything below maps to how StudioX Missions actually execute; I'm not going to hand-wave.
A mission is an org chart, not a script
The Deployment Risk Scorer is a Mission: a small roster of specialist agents coordinated by a reasoning layer. It is not a workflow with fixed branches. Each agent is a StudioX Vibe backed by its own bot and its own knowledge base, so knowledge stays isolated — the agent that reads incident history never accidentally reaches into deploy-tooling credentials, because it simply doesn't have them.
When a change is ready, the mission receives an intent in plain language — "Score the risk of PR 4821 and propose a canary plan." From there a two-tier reasoning system runs. Tier 1 is the reasoning core: it reads the request and the roster of agents, and decides, one round at a time, which single agent should act next. It can run several rounds, accumulating results, until it judges the request answered. Tier 2 is the agent planner: once an agent is chosen, it discovers that agent's capabilities — the MCP tools, knowledge bases, and vibes available to it — decomposes the goal into an ordered set of steps, and executes them. The reasoning core is the project manager; each agent is a worker that does the tool work and hands back only the slice the core asked for.
The agents on this mission
The Change Intake Agent reads the change itself — the diff, the touched files, the services and config those files belong to — via the GitHub MCP tools. The History Agent takes that list of touched components and queries its knowledge base of escalations, incidents, and post-mortems, returning only the history relevant to these components, not a dump of every incident you've ever had. The Blast-Radius Agent does the dependency and impact analysis — the same "blast radius analysis" capability we run in the Security domain — to work out what else fails if this fails. Finally the Canary Planner synthesizes all of that into a concrete, phased rollout proposal, and because its name contains "report," its output is persisted as a first-class report record you can acknowledge or escalate later.
Observations: you watch it reason
This is the part that matters most to me as an architect, because it's what separates a mission from a magic box. Every phase — each routing decision, the capability discovery for an agent, the plan it produced, each step and the check on that step — is recorded as a trace event and streamed live. The mission test surface pushes these to the client over Server-Sent Events, so on the Explain rail you literally watch "Selecting agent…" then "Invoking History Agent…" then "Completed" as it happens. When the Canary Planner recommends a 2% first phase, you can trace why: which incidents the History Agent surfaced, which dependency the Blast-Radius Agent flagged. Nobody programmed "prefer a slower canary for components with a bad record" — the reasoning core inferred it from the history and the analysis. Change the knowledge, the recommendation changes.
Where the human stays in control — and where the mission is honest about being read-only
I want to be exact here, because it's easy to oversell. The forecasting work is entirely read-only. Change Intake reads the diff; the History Agent reads a knowledge base; the Blast-Radius Agent reads dependency data; every MCP server on this mission is wired for read access only. Producing the risk assessment and the proposed canary plan touches nothing in production. That's deliberate — a forecast should never have side effects.
The human-in-the-loop gate exists for the one action that isn't read-only: actually executing the rollout. If you ask the mission to kick off the canary rather than just propose it, it does not silently do so. For any high-blast-radius or irreversible action, the mission ends its turn with an approval request rather than claiming the action was taken. That request becomes a row in the Decision Queue, and each named reviewer gets a magic-link approve/reject email. Nothing executes until a person clicks approve. The plan is advisory until a human makes it real.
Wiring the tools: instant MCP servers
The reason we can stand this up against your GitHub, your incident tracker, your Kubernetes without an integration project is MCP. Tools register with the mission at runtime; the agents discover and use them immediately, with no code change and no redeployment. Swap PagerDuty for ServiceNow and you register a new server — the agent that reads escalation history calls the same conceptual tool, only the server behind it changes. That runtime capability injection is what lets one mission template serve very different toolchains.
If you want the leadership case for all this, Ajay's why it matters is the place to start, and Patrick's in-practice write-up shows it running against real changes. For how missions run inside your own perimeter, see enterprise deployment, and for the broader agentic model, StudioX Missions.
Discussion
No comments yet — start the conversation.