How the Test Stimulus & Coverage Generator Works
In the companion piece I argued that the real cost of verification isn't writing tests — it's the blank page before them, and the coverage hole nobody can see. Here I want to open the hood and show you exactly how StudioX turns "checking a change" into something a machine drafts and a human approves, without pretending it's magic. Everything below maps to how a StudioX Mission actually executes, so where a step is read-only, I'll say so plainly.
A Mission, not a script
The Test Stimulus & Coverage Generator is a Mission: a small org chart of specialist agents coordinated by a reasoning layer. It is deliberately not a fixed pipeline. When a pull request arrives, the Mission takes the change as intent — "check this" — and a two-tier reasoning system goes to work.
Tier 1 is the Reasoning Core, the router. It reads the request and the roster of available agents and decides, one round at a time, which single agent should act next. It knows nothing about memory controllers or coverage bins; all of that domain knowledge lives in the agents. It accumulates each agent's result and, every round, re-reads everything gathered so far to decide whether the change has been checked well enough or whether another specialist should run. When it judges the work complete, it stops.
Tier 2 is the agent planner. When the Core hands a goal to a specific agent, that agent discovers its own capabilities — its MCP tools, knowledge bases, and vibes — decomposes the goal into an ordered set of steps, and executes them. Each agent is backed by its own StudioX bot with its own isolated knowledge base, which matters here: the agent that reasons about your design spec never accidentally reaches into the coverage database, and vice versa. Knowledge isolation is by design.
The agents on this Mission
For checking a change, the roster looks like this:
- Change Analysis Agent — reads the diff. What RTL moved? What new mode, port, or state was introduced? It reconstructs intent from the change itself, so the enumeration starts from what actually moved, not from a stale spec.
- Spec & History Agent — queries a knowledge base holding the design specification, verification plan conventions, and — critically — the history of past escapes. The corner that bit you last time is exactly the corner worth stimulating this time, and that institutional memory lives here as grounded, queryable knowledge.
- Coverage Agent — reads the existing coverage model to find what bins already exist, so the draft targets the gaps rather than restating what's already covered.
- Stimulus Agent — drafts the directed stimulus for the paths the change obviously touches and the corner-case stimulus for the straddles, inversions, and full-queue conditions a tired human skips.
- Report Agent — assembles the deliverable: the stimulus plan plus the proposed coverage targets, as a reviewable artifact.
Every one of these is a StudioX Vibe backed by a bot — registered, not coded. You add a specialist by describing what it does and pointing it at a knowledge base. The Reasoning Core considers it automatically on the next run.
Observations: watching it reason
The reason I trust this pattern over a black-box generator is the observations. Every decision the Mission makes streams live as a trace event — over Server-Sent Events into the Explain rail — in true execution order. You watch the Reasoning Core select the Change Analysis Agent and read why. You watch the planner decompose "check this change" into ordered steps. You watch each step run against its bot, get validated, and record its output. When the Stimulus Agent proposes a corner case, the trace shows the Spec & History result that justified it.
This isn't a log you dig up after the fact. It's the mechanism of trust: if the Mission proposes a stimulus you disagree with, you can see exactly which agent, which knowledge, and which reasoning step produced it — and fix the knowledge, not the code.
Where the human stays in control — and where nothing needs gating
Let me be honest about the control surface, because it's easy to oversell. Drafting stimulus and coverage targets is read-only. The Change Analysis Agent reads a diff. The Coverage Agent reads the existing model. The Stimulus Agent produces text. None of that mutates your repo, your regression, or your silicon — so none of it needs an approval gate. The deliverable lands as a reviewable artifact, and StudioX can render it into a portal (via a [REQUEST_PORTAL] block that becomes a clickable builder link) so the engineer reviews the plan in a proper UI surface rather than a chat blob.
The decision queue and human-in-the-loop gate matter only when you wire the Mission to take an action — opening the PR that commits the new coverage bins, or kicking off a regression run. Those are destructive or irreversible, so the Mission ends its turn with a [REQUEST_APPROVAL] block instead of claiming it acted. That creates a decision-queue row, emails the reviewer a magic-link approve/reject URL, and shows the pending action in chat as an "awaiting approval" status. Nothing runs until a human clicks approve. If you keep the Mission purely advisory, that gate simply never fires — and that's a legitimate configuration, not a missing feature.
Wiring the tools: instant MCP servers
The agents reach your Git host, your coverage database, and your simulation tooling through MCP servers discovered at runtime. This is what makes the Mission portable across teams: the Generic Agent discovers available tools from a registered MCP server and uses them immediately — no code change, no redeployment. Register your coverage database's MCP server today and the Coverage Agent can query it on the next run. Swap regression tools next quarter and you re-point one MCP registration, not the Mission.
That's the whole architecture: intent in, specialist agents reasoning under an observable Core, tools injected at runtime, a human gate exactly where an action would be taken and nowhere it wouldn't. If you want the business case for why this matters, see why it matters; for the lived version on a real diff, read Patrick's in practice. The general shape lives under AI Missions and workflow automation.
Discussion
No comments yet — start the conversation.