Repo Knowledge Bot: How It Works as a StudioX Mission

A senior engineer's real job is rarely the code they write. It's the questions they answer. Where does auth get injected? Why does this retry three times? Which service owns the iris_decisions table? That knowledge lives in three people's heads, and everyone else queues for it. The Repo Knowledge Bot is how we take it out of those heads and make it a thing you can ask — in plain language, and get back an answer grounded in the actual code, with the exact lines cited. This post is the mechanics: how we built it as a StudioX Mission, and why the pieces fall where they do.

If you want the case for why this matters, that's the companion post. If you want to see it answering real questions on a real repo, there's an in-practice walkthrough. This one is for the engineer who wants to know what's actually running.

The mission, in one sentence

A StudioX Mission is a small org chart of specialist agents plus a reasoning core that routes between them. The Repo Knowledge Bot is a Mission whose center of gravity is one specialist — a Code Agent — backed by its own bot and its own knowledge base of the indexed repository. Around it we hang a Generic Agent that discovers external tools (git history, Jira, Confluence) at runtime. The mission takes a plain-language question, retrieves grounded evidence, and returns an answer with citations — every step of that reasoning visible as it happens.

Indexing the repo, and retrieving from it

The repository becomes a StudioX knowledge base attached to a bot. We provision that bot as a knowledge-bot type — a special-scope bot (DYNAMIC_BOT_TYPE: "knowledge-bot" in settings.envs) that stays out of the normal bot listings and is only reachable through the mission. Source files, docs, and READMEs are chunked and embedded into that KB; each chunk carries its file path and line span as metadata, which is what later lets us cite path/to/file.ts:120-138 rather than hand-wave "somewhere in the auth layer."

Retrieval is a four-step vibe workflow, the same fallback pattern we ship for any knowledge-bot:

queryKnowledgebaseVectors — vector search over the repo KB for the extracted question, pulling a wide net (count 100). It takes a primaryBotUrl and an optional secondary_bots array, so a single mission can search one repo's KB or fan out across several repos at once — you decide per agent which knowledge bases are in scope.
rerankKnowledgebaseVectors — a reranker narrows those 100 candidates down to the 10 most relevant to the actual question. Vector recall is broad; the reranker is what makes the top of the list trustworthy.
formatRerankedResults — shapes the surviving chunks into a grounded, cited answer, carrying each chunk's source back through.
sendResponse — returns it.

Access control rides along for free: each KB still respects who's allowed to see it, so a contractor asking the mission sees only the repos they're cleared for.

The agents that run

For a focused Q&A mission with a single Code Agent, the reasoning core doesn't waste a routing call. When a mission has exactly one enabled agent, IRIS takes the fast path — it skips the router LLM entirely and chats the agent's bot directly. That's the common, cheapest case: question in, retrieval pipeline runs, cited answer out.

The moment you add a second agent, the two-tier system kicks in. Tier 1 (the reasoning core) reads the question, the agent roster, and a one-line index of every knowledge base, then picks one agent per round — accumulating results until it decides the request is answered. Tier 2 (the agent planner) takes the goal handed to the chosen agent, discovers that agent's capabilities, decomposes the goal into ordered steps, and runs each one — ending on a reason step that produces the user-facing answer. So "explain this function and show me who changed it last and why" can fan across the Code Agent and the Generic Agent, and the core stitches the results into one reply.

Observations: watching it reason and cite lines

This is the part that earns trust. Every phase — routing, capability discovery, the plan, each retrieval step and its validation, the final answer check — is recorded as a trace event and streamed live to the Explain rail over Server-Sent Events, in true execution order. You don't get a black-box answer; you watch "selecting Code Agent," then the vector query, then the rerank, then the exact chunks it grounded on with their file-and-line citations, then the synthesized answer. When the bot does the work, its structured per-tool events flow back on a dedicated explainability channel (gated by a canSeeExplainability authorization flag, so only entitled users see the internals). If the mission ever cites the wrong file, the observation stream shows you which retrieval step went sideways — you fix the knowledge, not the code.

Instant MCP: wiring git, Jira, and Confluence as tools

The repo KB answers "what does the code say." The Generic Agent answers "what happened around it" by discovering tools from MCP servers at runtime. With Instant MCP you stand up a server in a few clicks — import a GitLab, Jira, or Confluence OpenAPI spec, or start from a Quick Start template — and each server is just a URL you register on the bot. The bot discovers the tools automatically and starts using them; no integration code, no redeploy. Secrets are encrypted (AES-256-GCM) and auto-injected as headers, so tokens never pass through the conversation. Register a Jira MCP today and the mission can cross-reference the ticket that motivated a commit tomorrow — the Generic Agent considers it instantly, because capability is injected at runtime, not compiled in.

Decision queue and human-in-the-loop — honestly, not here

I'll be straight about this: a read-only Q&A mission doesn't need approvals. The decision-queue machinery exists — a synthesis step can emit a [REQUEST_APPROVAL] block for destructive or irreversible actions, which the route layer turns into an iris_decisions row and emails reviewers a magic-link approve/reject URL. But the Repo Knowledge Bot only reads. There's nothing to gate. The human-in-the-loop here isn't an approval checkpoint; it's the observation stream — the human stays in the loop by watching the reasoning and checking the citations, not by signing off on an action. That distinction matters. Bolt this same mission onto tools that write — open a ticket, cut a branch, flip a flag — and the decision queue is exactly where the approval belongs.

The portal where people ask

People don't ask over an API; they ask in a place. The mission exposes channels — chat, email, Slack, API — and for the read-only Q&A case the chat portal is home base. New hires, testers, support, and PLM open it and type a question the way they'd Slack a senior. If a mission ever needs to hand back something interactive, a [REQUEST_PORTAL] block gets rewritten into a clickable builder link. But for the Repo Knowledge Bot, the portal is deliberately plain: a box to ask, and an answer that cites its sources. The senior's queue, finally, is empty.