How the Cyber Security Responder Works: Seven Agents, One Verdict

When people hear "AI answers our security questionnaires," they usually picture a prompt: paste the questions, get back a wall of text, clean it up. That is not what the Cyber Security Responder is, and the difference is the entire point. I lead the Missions engineering team at StudioX, and what we actually built is a multi-agent AI Mission — a small org chart of specialist AI Workers that route work between themselves, ground every claim in your own records, explain each step as they go, and refuse to let anything reach a customer without a human verdict. Patrick has written about why this problem is worth solving; here I want to open the hood.

Seven specialists, not one generalist

The Responder is built on the standard StudioX Help Desk mission template, which means there is no custom code and no bespoke model. It is seven cooperating agents, each an expert in exactly one thing, each backed by its own bot and its own knowledge base:

Intake — receives the inquiry in whatever form it arrived and identifies the customer (and, importantly, their NDA status).
Classification — labels each question against the right framework: NIST CSF, ISO 27001, CMMC, DFARS, and routes by category and SLA.
Knowledge — the workhorse; searches your standard response library for the closest previously approved answer.
Draft — assembles a candidate answer in the correct tone, with citations, and scores its own confidence.
Review — the human-in-the-loop gate.
Response — delivers the approved answer in the customer's original format.
Report — writes everything to the audit trail and analytics.

A Reasoning Core sits above them as the orchestrator. It reads the incoming intent and the roster of agents and decides, one step at a time, who acts next — accumulating results until the inquiry is fully answered. Nothing about the domain is hard-coded into that router; all the security knowledge lives in the agents' knowledge bases. Change a routing rule in the Classification KB, and the behavior changes with no redeploy.

Retrieval, not generation

The single most important design choice is that the Draft agent does not answer from the model's own knowledge. It answers from yours. The Knowledge agent searches a curated standard response library — your previously approved questionnaire answers, policy position statements, and framework-mapped responses, each stored with its source policy, last-reviewed date, and sensitivity tag. This is Enterprise Knowledge: retrieval-grounded, queryable, and yours, not a generic corpus.

That grounding drives the confidence signal, which is the mechanism that keeps the system honest. Every draft lands in one of four states. High confidence means a close library match — a complete answer with citations. Medium means a plausible match the reviewer should verify against current policy. Low means no adequate match was found, and the Draft agent is instructed not to fabricate — it escalates instead of guessing. Needs SME flags a question the library simply cannot answer. The rule the Draft agent follows is blunt on purpose: below the confidence floor, do not draft. A confident wrong disclosure is the failure mode we are engineering against, so the system is designed to prefer an honest gap over a fluent fabrication.

Observations on the Explain rail

Because this is a Mission and not a black box, every decision the agents make is recorded as an Observation and streamed to the Explain rail in real execution order. A reviewer does not just see the final answer; they see which framework the question was classified as, which library entry the Knowledge agent matched and why, the confidence score and the citations behind it, and any similar past responses. When something looks off, you can trace the reasoning back to the exact knowledge that produced it — and then fix the knowledge, not wrestle a prompt. That traceability is what makes an autonomous system trustworthy enough to put in front of a customer's auditor.

The Decision Queue is not optional

The Review agent implements true Human-in-the-Loop control through a Decision Queue. Nothing the AI produces is a disclosure until a reviewer makes it one. The reviewer sees every question as a row — category, question, drafted answer, confidence, status — and can approve, edit, flag, or write from scratch in seconds. Click any row and the full answer opens with its citations (NIST CSF, ISO 27001, CMMC, DFARS) and its similar past responses. High-confidence answers can move quickly; Low and Needs-SME rows demand attention. The queue is the control boundary: the AI drafts, the human decides, and the audit trail records exactly who decided what.

Layered underneath is an NDA interlock. Every library entry is sensitivity-tagged, and the customer's NDA status determines which tiers the Knowledge agent is even allowed to retrieve during drafting. Confidential material never enters the AI's context for a customer who is not cleared for it — the AI cannot leak what it was never allowed to see. If a reviewer chooses to disclose above the automatic tier, that override is logged explicitly.

Meeting every customer's format through MCP

Security questionnaires arrive as email prose, Excel sheets, Word RFP sections, or pasted text, and customers want the answer back in the same shape they sent it. The Responder handles this through Model Context Protocol integrations. One MCP server connects Intake to your document store or shared mailbox to pull inquiries in; another connects Response to your mail system or customer portal to deliver answers out — an .xlsx with the customer's own columns preserved plus answer, confidence, and citation columns; a redlined .docx; or a ready-to-send .eml reply with inline citation footnotes. The agents call the same tool names regardless of what sits behind them, so swapping SharePoint for another store is a configuration change, not an engineering project.

Why this shape holds up

Everything here runs inside your own private enterprise deployment, so sensitive disclosures never leave your boundary. The result is a system that is fast because it reuses what you have already vetted, consistent because every answer flows from one governed library, honest because it admits what it does not know, and defensible because every step is observed, every disclosure is gated by a human, and every action is logged. If you want to see it run against a real inquiry end to end, Patrick covers exactly that in the Responder in practice.