ai-agentsai-observabilitymcp

How the Explain Rail Works: Observations Inside a Mission

TS
Trevor Solis · Lead AI Engineer, Missions
October 22, 2025

I build the reasoning layer behind StudioX Missions, and the question I get most from other engineers is deceptively simple: how does the Explain rail actually get populated? People assume it's a logging wrapper bolted on after the fact — capture some strings, pretty-print them at the end. It isn't. The rail is a live projection of the real control flow. Every observation on it corresponds to an actual decision the Mission made, emitted the instant it happens and persisted in true execution order. Let me walk through the machinery.

A Mission is two tiers of reasoning

Start with the shape of the thing. A Mission is a small org chart of specialist agents, and when a user sends a message, a two-tier system runs.

Tier 1 is the reasoning core — the project manager. It looks at the request plus the roster of available agents and decides, one round at a time, which single agent should act next. It can run several rounds, accumulating results, until it judges the request answered. Each round is an LLM call that returns a chosen agent and the specific goal to hand that agent — or a signal that no agent is needed and we're done.

Tier 2 is the agent planner — the worker. It takes the goal, discovers the chosen agent's capabilities (its MCP tools, knowledge bases, and vibes), decomposes the goal into an ordered list of steps, and executes each one. Most steps run on the agent's own bot, which has its own internal pipeline. The final step is always a reason step that produces the user-facing answer.

Every one of those junctures is an observation. Routing decisions, capability discovery, the plan itself, each step and its validation verdict, the final answer check — each is recorded as a trace event and streamed to the rail as it occurs.

Observations are events, not a summary

Here is the part that matters architecturally. As the Mission runs, the chat route streams progress events and explainability traces back live. Each phase — routing, discovery, plan, each step plus its validation, the answer judge, any recovery pass — is written as a trace event with an event_type, a detail payload, and a monotonic sequence number. They're persisted in the iris_conversation_messages table and rendered on the Explain rail in true execution order.

That monotonic seq is doing quiet, essential work. Missions fan out — parallel tool calls, multi-round routing, retries — so events don't naturally arrive in a clean line. The sequence number lets the rail reconstruct the exact order things happened, whether you're watching live over Server-Sent Events or replaying the trace a week later. There is no "generate an explanation" step at the end. The explanation is the event log.

This is the design decision I defend most often in reviews. A post-hoc summary is a story about a run; a sequenced event log is the run itself. When an auditor asks why the Mission chose one agent over another, we don't reconstruct an answer — we point at the routing event with its reasoning field, exactly as it was written the instant the decision was made. Nothing is rewritten for presentation. That is what makes the trace admissible rather than merely reassuring.

User message Tier 1 — Reasoning core (router) picks ONE agent per round, loops until done Tier 2 — Agent planner (worker) discover capabilities → plan steps → execute bot + KB MCP tools reason (final) Marker post-processing [REQUEST_APPROVAL] · [REQUEST_PORTAL] results loop back Explain rail SSE live · replay by monotonic seq seq 1 · route → Network Agent seq 2 · discover tools + KBs seq 3 · plan: 3 steps seq 4 · tool call + citation seq 5 · validate step ✓ seq 6 · answer judge ✓ seq 7 · [REQUEST_APPROVAL]

Where the agent actually does the work

When the planner executes a non-reason step, it chats the agent's bot over SSE. That bot runs its own pipeline: it loads its config, runs a vibe/MCP gate that picks one handler for the turn, and either enters an MCP agent loop, runs a vibe workflow, or does standard function-calling. The bot streams two channels back: 0: lines are text tokens — the answer IRIS reads as the step output — and 2: lines are explainability events, the structured per-tool calls with their results.

One honest constraint worth naming: those 2: tool events only reach the rail when canSeeExplainability is true for the chatting identity — super-admin, the bot's business-admin, or the bot owner. The Mission's own reasoning trace always streams; the deep per-tool detail from inside a bot is authorization-gated. Not every viewer should see raw tool payloads from every enterprise system.

Instant MCP servers keep the rail honest

Agents don't have hardcoded tool lists. A generic agent discovers available tools from MCP servers at runtime — register a new tool and the agent can use it immediately, no code change, no redeploy. Because discovery is itself an observation, the moment a new capability shows up, it shows up on the rail too. The transparency layer never goes stale relative to what the Mission can actually do.

The last mile: decision queue and portals

After the reasoning loop returns, the route scans the final text for marker blocks. A [REQUEST_APPROVAL] (or [REQUEST_DECISION]) block creates an iris_decisions row, emails each reviewer a magic-link approve/reject URL, and rewrites the marker into an "Awaiting approval" status block in chat. That's human-in-the-loop enforced structurally: state-changing actions don't fire silently, they queue for a person. A [REQUEST_PORTAL] block spins up a portal in Studio and rewrites the marker into a clickable builder link. Both events land on the rail with their own seq, so the approval gate is as visible as the reasoning that led to it.

That's the whole loop. Two tiers reasoning, every juncture emitted as a sequenced event, streamed live and replayable in order, with the irreversible actions parked in a queue for a human. If you want the business case for why we obsessed over this, our founder Ajay wrote why it matters; for how it lands with customers, see Harry's field notes on observations in practice. The design principle underneath all of it, and the rest of our enterprise AI platform, is simple: a decision you can't replay in order is a decision you can't trust.

Discussion

No comments yet — start the conversation.

Join the discussion

See StudioX run.

Put autonomous AI workers to work on your own systems and knowledge.