AI MissionsArchitectureTesting

How the Fault-Injection Forge Works: Agents, Observations, MCP

MW
Mark Weber · Chief Enterprise Architect
August 14, 2025

In the companion piece I argued that field escapes come from a missing input, not a missing test — the dangerous conditions never get built into a fixture because building them by hand requires already knowing they're dangerous. This article is the other half: how the Synthetic Data & Fault-Injection Forge actually produces those conditions, framed honestly as a StudioX Mission so you can see exactly which parts reason, which parts act, and where a human stays in control.

A Mission, not a script

The Forge is a Mission: a small org chart of specialist AI workers coordinated by a reasoning layer. That distinction matters, because a script would force you back into the same trap — it can only generate the cases someone thought to encode. A Mission reasons about the request against your actual data model and your actual incident history, so it can propose conditions nobody wrote down.

When you send an intent — "give me fixtures that exercise the refund path across every currency-and-balance combination we support, plus a fault scenario for the downstream ledger timing out mid-write" — the request lands on the Mission's Reasoning Core. The Core knows nothing about synthetic data itself. All of that expertise lives in the agents. What the Core does is read the roster of available agents and their descriptions and decide, one round at a time, which single agent should act next. It runs several rounds, accumulates each agent's result, and stops when it judges the request answered. Every one of those decisions is an LLM call, not a hard-coded branch — change an agent's description and the routing changes; add an agent and it gets considered automatically.

The agents that run

Each specialist agent is backed by its own bot and its own isolated knowledge base, so a schema question never accidentally returns incident data and vice versa. For a Forge request the Core typically orchestrates a handful of them across multiple rounds:

  • A Schema Agent grounds the request in your real data model — tables, types, constraints, required combinations, the referential edges that a naive generator would violate. It queries the schema now, at runtime, not from a build-time snapshot.
  • An Escape-History Agent mines your own postmortems and defect history for the conditions that have actually hurt you before — the negative-balance-in-a-new-currency shape, the 40,000-row account, the malformed payload class. This is where the Forge gets its imagination: it doesn't invent edge cases in a vacuum, it cross-references the patterns that preceded real escapes.
  • A Fixture Agent generates realistic, referentially-valid records for the happy and near-happy paths, at whatever volume the request implies.
  • An Edge-Case Agent takes the schema plus the escape history and produces the boundary and malformed sets — the combinations that are legal enough to enter the system and pathological enough to break it.
  • A Fault-Scenario Agent builds the failure conditions themselves: the third-retry timeout, the partial write, the out-of-order event, the concurrent-update race.
  • A Coverage/Report Agent checks the generated corpus back against the escape history and reports what's now exercised and what still isn't — persisted as a first-class report you can act on.

The Core hands each agent a scoped goal and, crucially, a scoped return ask. The Escape-History Agent might read hundreds of incidents internally but hand back only the dozen patterns relevant to the refund path — the worker does the heavy lifting in its own large working context and returns only the slice the project manager asked for, not a raw dump. That division of labor is what keeps a multi-agent Mission legible instead of drowning the final answer in bulk data.

Intent (plain language) "fixtures + fault scenario for refunds" Reasoning Core routes one agent per round · accumulates Schema Agent reads data model at runtime Escape-History mines past field escapes Fixture realistic records Edge-Case boundary + malformed Fault-Scenario timeouts · partial writes · races Instant MCP servers (runtime tools) schema registry · test DB · CI · incident tracker Decision queue · human-in-the-loop approve before anything seeds a shared env Coverage report + generated corpus what's now exercised · what still isn't

Watching it reason — observations

None of this is a black box, and that is deliberate. A Mission streams its reasoning as observations: as the run progresses, each step appears live on the Explain rail — "Routing to Escape-History Agent," "Discovered 3 MCP tools," "Generated 12 boundary fixtures," "Validated step: pass." You watch the Reasoning Core pick an agent, watch the agent discover its capabilities, watch it decompose the goal into ordered steps, and watch each step get validated before the next begins. When the Coverage Agent reports a gap, the trace shows you why it flagged it — which historical escape pattern it matched against. This is what makes the generated data trustworthy: you can see the reasoning that produced every fixture, not just the fixture.

Where tools plug in — instant MCP servers

The agents don't reach your systems by bespoke integration. Tools are registered as instant MCP servers and discovered at runtime. The schema registry, the scratch test database the Fixture Agent seeds into, the CI system that will run the change against the new corpus, the incident tracker the Escape-History Agent reads — each is an MCP server the relevant agent discovers and uses immediately. Register a new source and the Mission can use it on the next run, no redeployment. This is why standing up a Forge against a new service is a matter of pointing at its schema and its history, not a build-time integration project.

Where the human stays in the loop — honestly

I want to be exact about read-only versus acting, because over-claiming here erodes trust. Generation is safe by construction. The Fixture, Edge-Case, and Fault-Scenario agents produce data into an isolated scratch space; reading your schema and your incident history is read-only. Nothing about generating a corpus touches a shared environment, so that path runs autonomously.

The moment an action becomes consequential — seeding a fault scenario into a shared staging environment, or triggering a CI run that a team depends on — the Mission does not just do it. It emits an approval request, which lands in the decision queue as a pending row. A reviewer sees exactly what will be applied and approves or rejects from the portal, the UI surface where teams run and review Forge requests and browse the coverage reports. Human-in-the-loop is the gate on the acting steps, not friction on the generating ones.

That is the whole architecture: reason about the request, ground it in your real model and your real history, generate the conditions that actually cause escapes, show every step, and stop for a human before anything consequential happens. For why this matters at the leadership level see why the Forge matters, and for a lived before-and-after see the Forge in practice.

Discussion

No comments yet — start the conversation.

Join the discussion

See StudioX run.

Put autonomous AI workers to work on your own systems and knowledge.