Test Stimulus & Coverage: Why the Blank Page Costs So Much
The email arrives at 4:40 on a Thursday. A verification engineer I'll call Priya has been staring at a diff for the better part of an hour. It's not a big change — a new arbitration mode bolted onto a memory controller, maybe two hundred lines of RTL. The design engineer has already moved on to the next feature. Priya's job now is to prove the change is safe, and she is paying the tax she pays on every single pull request: the blank page.
There is no test yet. There is no coverage plan yet. There is a mode she half-understands, a spec she has to re-read, and a nagging certainty that the interesting failure is not in the path she'll think of first. It's in the one she won't. Somewhere in that diff is a corner — a back-to-back request that straddles a refresh, a priority inversion that only bites when the queue is exactly full — and whether it gets tested this week depends entirely on whether one tired human happens to imagine it before the PR merges.
That's the problem I want to talk about. Not the writing of tests. The staring before the writing. And the silent thing that grows in the gap.
The blank-page tax nobody line-items
Every organization I talk to measures verification by the artifacts it produces: testbenches written, regressions run, coverage closed. Nobody measures the hour before the artifact — the ramp-up where an engineer reconstructs intent from a diff, decides what's worth stimulating, and enumerates the corners. But that hour is real, it recurs on every change, and it scales with exactly the wrong thing: your velocity. The faster your design teams ship, the more blank pages your verification team faces, and the wider the gap between "code changed" and "change checked."
I've watched senior people — the ones whose judgment you're actually paying for — spend the first third of every task not exercising that judgment but reassembling context. It's the most expensive kind of waste, because it's invisible. It never shows up as a defect. It shows up as a schedule that's always a little behind, a team that's always a little underwater, and a coverage report that's always a little optimistic.
Coverage holes are invisible until they aren't
Here's the part that should keep leadership awake. A test that was never written leaves no trace. A corner that nobody imagined doesn't appear in the coverage report as a red line — it appears as nothing at all, because coverage only measures the bins you thought to define. The hole isn't flagged. It's simply absent from the conversation.
So the change merges. The regression is green. Everyone signs off in good faith. And the corner sits there, un-stimulated, for weeks or months — until it surfaces as an escape. In silicon, an escape is a respin, a schedule slip measured in quarters, and a number with a lot of zeros. In a shipped product, it's a field failure and a customer who now audits everything you send them. The cost of the hole didn't go away when the PR merged. It just got deferred, and it accrued interest.
Why "hire more, try harder" doesn't close it
The instinct is to throw people at it — more verification headcount, longer review checklists, a mandate to "think about corner cases." I've run that experiment more than once. It doesn't scale, and it doesn't work, for a structural reason: the blank-page tax is per-change and per-person, so adding people adds blank pages. And the corners you miss are, by definition, the ones your current mental models don't surface. You can't checklist your way to the case you didn't think of.
What actually helps is changing when the enumeration happens and who does the first pass. If every change arrived with a draft already in hand — directed stimulus for the paths the diff obviously touches, corner-case stimulus for the straddles and inversions a tired human skips, and an explicit list of coverage targets naming the bins that ought to exist — then the engineer's expensive hour starts at the interesting part. They're no longer generating from nothing. They're reviewing, correcting, and deepening. The judgment you pay for gets spent on judgment.
That's the shift I care about as an architect: from a team that produces the first draft to a team that reviews it. The blank page moves from the engineer to the machine, and the coverage hole moves from "discovered in silicon" to "named in the PR."
What checking-first looks like
This is the outcome we designed the Test Stimulus & Coverage Generator to deliver, and it's why I frame it under CHECKING rather than test-writing. It doesn't replace the verification engineer's authority over what ships. It attacks the specific, recurring, invisible cost — the blank page and the unseen hole — and it does so at the one moment that matters: on the change, before a single test is written.
The mechanics of how it actually runs — the specialist agents, the observations you can watch reason in real time, and the exact points where a human stays in control — are their own story, and I've handed the "how" to my companion piece: how it works. If you'd rather see it play out on a real Thursday-afternoon diff, with the cycles and hours attached, Patrick walks through a day in practice.
The deeper pattern here — encoding expert judgment into observable, human-gated AI Missions rather than another dashboard — is the same one we apply across workflow automation generally. But it starts with admitting the tax exists. Priya's Thursday hour is not a personal failing. It's a systemic cost we finally have a way to stop paying.
Discussion
No comments yet — start the conversation.