How the Pre-Silicon Sim Harness Mission Works

The companion piece to this one makes the business case for checking a change before it becomes silicon. This piece is for the people who will actually have to trust the thing: the verification architects and RTL leads who, quite rightly, don't believe a system until they understand what it does on every path. So let me open the hood. The Pre-Silicon / Target Sim Harness is a StudioX Mission, and a Mission is not a script with an LLM bolted on. It's a small org chart of specialist agents coordinated by a reasoning core that decides, one action at a time, what to do next — and narrates every decision as it goes.

The shape of the Mission

A Mission takes a goal in plain language — check commit 4a1c against the target before we bless it for tape-out — and hands it to the reasoning core. The core is the project manager. It holds a private working memory (we call it the notepad) containing the goal, the roster of available agents, and the findings gathered so far. On each iteration it looks at the notepad, thinks, and emits exactly one next action: discover an agent's capabilities, invoke one agent with one scoped request, or declare the goal done. Then it re-plans from what the last result actually said. It never runs a fixed, pre-baked plan; it reasons its way forward, which is what lets it adapt when a change touches something nobody anticipated.

Each specialist agent is a StudioX Vibe backed by its own bot and its own knowledge base — knowledge isolation by design, so the agent that reads sign-off policy never accidentally answers with waveform data. For this Mission the roster looks like this:

Change Triage Agent — reads the diff and classifies what behavior the delta touches: reset sequencing, a clock-domain boundary, a FIFO depth, a power domain. This narrows everything downstream.
Sandbox Agent — stands up the simulation/emulation environment for exactly this change. It has no fixed domain of its own; it discovers the tools it needs from MCP servers at runtime.
Divergence Agent — runs the targeted stimulus against the model and reads back where target behavior and simulated behavior disagree.
History Agent — queries a knowledge base of past respins, known waivers, and prior divergences, so the verdict is grounded in your incident history, not a generic one.
Policy Agent — validates the result against your sign-off criteria.
Report Agent — persists the verdict as a first-class report record.

Watching it think: observations

The property that makes this trustworthy to an engineer is that you don't have to take the verdict on faith. As the Mission runs, every phase streams back as a reasoning trace over Server-Sent Events — routing, discovery, each agent invocation, each step and its validation, the final answer check. On the test page you watch it live: "Selecting Change Triage Agent… Invoking… classified: reset-domain change. Selecting Sandbox Agent…" Each event carries a plain-language explanation of why that step happened. When the divergence verdict lands, the trace shows the path to it — which stimulus exercised which signal, which historical incident the History Agent matched — the same way the network example in the Missions docs shows "Path 2 selected; Path 1 eliminated by Policy Agent." An audit that would take a person a morning to reconstruct is just the trace, recorded in execution order.

Standing up the sandbox: instant MCP servers

The Sandbox Agent is the interesting one, because it's where "check the change" becomes a real environment. It's a Generic Agent — no hardwired domain — and it reaches your toolchain through instant MCP servers. Your simulator, your emulator, your job scheduler, Git, the waveform tooling: each registers as an MCP server, and the agent discovers those tools at runtime and uses them immediately. Register a new emulation cluster tomorrow and the agent can drive it with no code change, no redeployment. This is the mechanism that lets one Mission serve many toolchains without a per-tool integration project.

A word on how the core keeps this honest under load. When it invokes the Divergence Agent, it doesn't say "dump the whole waveform." It authors a scoped, self-contained request — "report the cycles where the modelled reset deassertion and the target's disagree, each with signal name and cycle," — using triggers it learned from the Triage finding it already holds in its notepad. The agent does its heavy tool work internally over the full data and hands back only the small, relevant slice. Reliability comes from the precision of the question, not from re-grading the answer — which is exactly why a legitimate "zero divergences" result is trusted as a real answer rather than retried into noise.

Where the human stays in the loop

Be clear on this, because it's a common misread: the checking is read-only. The harness stands up a sandbox and reports a verdict; it does not touch hardware and it does not tape anything out. The consequential act — promoting the change to sign-off, or granting a waiver on a known-benign divergence — is gated. When the verdict warrants it, the Mission ends with a [REQUEST_APPROVAL] block instead of silently proceeding. The route turns that into a decision-queue row and emails each reviewer a magic-link approve/reject URL; the change waits until a human clicks. If someone asks for the divergence dashboard, a [REQUEST_PORTAL] block becomes a clickable Studio builder link. Human-in-the-loop isn't a bolt-on here; it's where the Mission is designed to stop.

All of it runs inside your own perimeter under your own deployment — the RTL never leaves the building — and because the platform is model-independent, you can swap the underlying LLM without touching a single agent. For the leadership framing of why this matters, see why it matters; for a real team's before-and-after, see in practice.

How the Pre-Silicon Sim Harness Mission Works

The shape of the Mission

Watching it think: observations

Standing up the sandbox: instant MCP servers

Where the human stays in the loop

Discussion

Join the discussion

See StudioX run.