Agent Testing Guide
Sample prompts and validation checklists for testing the VS Code custom agents.
Prerequisites
Before You Test
- The
codewiki-mcpMCP server must be running and configured in.vscode/mcp.json(included in this repo). - All 6
.agent.mdfiles must be in.github/agents/. - Test in the VS Code Chat panel (Ctrl+Shift+I), not inline chat. The
.agent.mdcustom agents only activate when invoked via@codewikiin the Chat panel. TherunSubagenttool in regular Copilot chat does NOT wire up MCP tools to subagents. - Custom agents only activate via
@codewikiin the Chat panel.
Agent Configuration Reference
Current YAML frontmatter for each agent (must match .github/agents/ files):
Master Orchestrator (codewiki.agent.md)
name: CodeWiki
description: Master agent that routes your request to the right CodeWiki specialist
model: GPT-5.3-Codex
tools:
[read, agent, codewiki-mcp/*]
agents:
[CodeWiki Researcher, CodeWiki Code Review, CodeWiki Architecture Explorer, CodeWiki Comparison, CodeWiki Synthesizer]
GPT-5.3-Codex. Free/low-tier models (GPT-5 mini) produce inconsistent routing, truncated results, and skipped delegation.Why
codewiki-mcp/* on the master? The master must declare MCP tools so they are exposed to subagents when spawned. The master itself still acts as a router — it delegates via agent and does not call CodeWiki tools directly.
Subagents
# Researcher, Code Review, Architecture Explorer, Comparison:
model: GPT-5 mini
user-invokable: false
tools:
[read, codewiki-mcp/*]
# Synthesizer (needs stronger reasoning for multi-repo integration):
model: GPT-5.3-Codex
user-invokable: false
tools:
[read, codewiki-mcp/*]
| Agent File | Name | Specialty |
|---|---|---|
codewiki-researcher.agent.md | CodeWiki Researcher | General exploration |
codewiki-reviewer.agent.md | CodeWiki Code Review | Module/function analysis |
codewiki-architect.agent.md | CodeWiki Architecture Explorer | System design |
codewiki-comparison.agent.md | CodeWiki Comparison | Multi-repo comparison |
codewiki-synthesizer.agent.md | CodeWiki Synthesizer | Combine parts from multiple repos |
Routing Quick Reference
| User Intent | Subagent | Signal Words |
|---|---|---|
| General exploration | CodeWiki Researcher | "what is", "explain", "tell me about", "overview" |
| Code analysis | CodeWiki Code Review | "review", "analyse", "module", "function", "code" |
| System design | CodeWiki Architecture Explorer | "architecture", "design", "structure", "hierarchy" |
| Multi-repo comparison | CodeWiki Comparison | "compare", "vs", "difference", "or" |
| Multi-repo synthesis | CodeWiki Synthesizer | "combine", "merge", "build using", "take X from A and Y from B" |
| Unindexed repo | CodeWiki Researcher | Subagent detects NOT_INDEXED and calls codewiki_request_indexing |
1. CodeWiki Researcher (General Exploration)
Routing trigger: General "what is", "explain", "tell me about" questions.
Sample Prompts
@codewiki What is facebook/prophet and what are its main features?
@codewiki Explain the key concepts behind pallets/flask
@codewiki What topics does CodeWiki have for microsoft/vscode?
# Bare keyword — resolves automatically (v1.2.0+)
@codewiki What is prophet and what are its main features?
Expected Behaviour
| Step | What Should Happen |
|---|---|
| 1 | Master spawns CodeWiki Researcher via the agent tool |
| 2 | Researcher calls codewiki_list_topics to discover available wiki sections |
| 3 | Researcher calls codewiki_read_structure and/or codewiki_read_contents |
| 4 | Researcher synthesises a summary from CodeWiki content |
Validation Checklist
- Master does not answer from its own knowledge
- Master uses the
agenttool (not direct tool calls to read/search) - Researcher cites CodeWiki sections in its answer
- Response contains real documentation content, not generic descriptions
- Master presents the full subagent response (not a brief summary)
- If a bare keyword was used, response is prefixed with a resolution note (e.g.
> Resolved: keyword "prophet" → facebook/prophet)
2. CodeWiki Code Review (Module / Function Analysis)
Routing trigger: "review", "analyse", "what does module X do", code-level questions.
Sample Prompts
@codewiki Review the forecaster module in facebook/prophet — what does it do?
@codewiki What code patterns are used in the routing module of pallets/flask?
@codewiki Analyse the error handling approach in fastapi/fastapi
Expected Behaviour
| Step | What Should Happen |
|---|---|
| 1 | Master spawns CodeWiki Code Review via the agent tool |
| 2 | Reviewer calls codewiki_search_wiki to find relevant code documentation |
| 3 | Reviewer calls codewiki_read_contents for detailed section content |
| 4 | Reviewer provides code-level analysis with citations |
Validation Checklist
- Master delegates to CodeWiki Code Review, not Researcher
- Reviewer focuses on code structure, patterns, and implementation details
- Response references specific modules, classes, or functions
- No hallucinated code — all content sourced from CodeWiki
- Master presents the full subagent response (not a brief summary)
3. CodeWiki Architecture Explorer (System Design)
Routing trigger: "architecture", "design", "how is X structured", "component hierarchy".
Sample Prompts
@codewiki Explain the overall architecture of facebook/react
@codewiki How is the plugin system architected in vitejs/vite?
@codewiki Describe the component hierarchy and data flow in vuejs/core
Expected Behaviour
| Step | What Should Happen |
|---|---|
| 1 | Master spawns CodeWiki Architecture Explorer via the agent tool |
| 2 | Explorer calls codewiki_read_structure to map the documentation tree |
| 3 | Explorer calls codewiki_read_contents for architecture-related sections |
| 4 | Explorer produces a structured architecture overview |
Validation Checklist
- Master delegates to CodeWiki Architecture Explorer
- Response covers high-level design (layers, components, data flow)
- Includes or references structural breakdowns from CodeWiki
- Does not devolve into code-level details (that's the Reviewer's job)
- Master presents the full subagent response (not a brief summary)
4. CodeWiki Comparison (Multi-Repo)
Routing trigger: "compare", "vs", "difference between", "X or Y".
Sample Prompts
@codewiki Compare fastapi/fastapi vs pallets/flask — architecture, performance, and developer experience
@codewiki Compare facebook/react vs vuejs/core in terms of rendering strategy
@codewiki What are the differences between expressjs/express and koajs/koa?
Expected Behaviour
| Step | What Should Happen |
|---|---|
| 1 | Master spawns CodeWiki Comparison via the agent tool |
| 2 | Comparison agent calls CodeWiki tools for each repo independently |
| 3 | Agent builds a side-by-side analysis from real documentation |
| 4 | Agent produces a structured comparison table or narrative |
Validation Checklist
- Master delegates to CodeWiki Comparison, not Researcher
- Agent fetches documentation from both repos (not just one)
- Comparison is grounded in CodeWiki content, not generic knowledge
- Response includes a structured comparison (table, bullet list, or sections)
- Master presents the full subagent response (not a brief summary)
5. Request Indexing (Unindexed Repo — Subagent Handles It)
Routing trigger: Repo that returns NOT_INDEXED from any CodeWiki tool.
New in v1.3.0: The tool now uses MCP Elicitation to confirm with the user before submitting an indexing request.
Sample Prompts
@codewiki Check if Snowflake-Labs/agent-world-model is available on CodeWiki
@codewiki What does CodeWiki have for some-org/obscure-repo?
Expected Behaviour
| Step | What Should Happen |
|---|---|
| 1 | Master classifies this as a general exploration request |
| 2 | Master spawns CodeWiki Researcher via the agent tool |
| 3 | Researcher calls a CodeWiki tool and gets NOT_INDEXED error |
| 4 | Researcher calls codewiki_request_indexing to submit the repo |
| 5 | Server asks user for confirmation via MCP Elicitation (v1.3.0+) |
| 6 | Researcher reports back; Master presents the full result to user |
Validation Checklist
- Master does not call any MCP tools directly (it delegates via
agent) - A subagent detects
NOT_INDEXEDand callscodewiki_request_indexing - User is asked to confirm indexing via elicitation prompt (v1.3.0+)
- User is informed the repo has been submitted for indexing
- Master presents the full subagent response (not a brief summary)
6. CodeWiki Synthesizer (Multi-Repo Solution Building)
Routing trigger: User wants to BUILD something new by combining parts from multiple repos. Distinct from Comparison which evaluates/contrasts.
Sample Prompts
@codewiki I want to build an API server that uses the routing system from pallets/flask and the async handling from fastapi/fastapi. Help me design it.
@codewiki Take the plugin architecture from vitejs/vite and the component model from vuejs/core — design a new framework that combines both.
@codewiki Combine the authentication approach from supabase/supabase with the event pipeline from apache/kafka into a real-time auth notification system.
# Intentionally vague — Synthesizer should DISCOVER which parts to take
@codewiki Can you combine the best parts from fastapi/fastapi and pallets/flask into a new web framework solution?
Expected Behaviour
| Step | What Should Happen |
|---|---|
| 1 | Master detects synthesis intent (“build”, “combine”, “take X from A and Y from B”) |
| 2 | Master spawns CodeWiki Synthesizer via the agent tool |
| 3 | Synthesizer researches each repo using CodeWiki tools (read_structure, read_contents, search_wiki) |
| 4 | Synthesizer extracts the specific parts the user requested from each repo |
| 5 | Synthesizer identifies cross-repo conflicts and proposes adapters |
| 6 | Synthesizer delivers a blueprint: architecture diagram, directory structure, integration code, implementation guide |
| 7 | For vague requests: Synthesizer shows a “Parts Selected” table explaining WHY it chose each part |
Validation Checklist
- Master delegates to CodeWiki Synthesizer, not Comparison
- Synthesizer fetches documentation from all mentioned repos
- Response includes a Parts Extracted table citing source repos
- Response includes Compatibility Analysis (conflicts + resolutions)
- Response includes Integration Architecture (Mermaid diagram or description)
- Response includes Directory Structure for the new project
- Response includes Implementation Guide with actionable steps
- For vague requests: includes Parts Selected table with reasoning
- All content is grounded in CodeWiki data, not generic knowledge
- Master presents the full subagent response (not a brief summary)
7. Keyword Resolution & Disambiguation (Bare Product Names)
Routing trigger: Any prompt using a bare keyword instead of owner/repo format.
Sample Prompts
@codewiki What is vue?
@codewiki Explain the architecture of react
@codewiki Compare vue vs react
@codewiki What topics does openclaw have?
Expected Behaviour
| Step | What Should Happen |
|---|---|
| 1 | Master delegates to the appropriate subagent (Researcher, Comparison, etc.) |
| 2 | Subagent calls a CodeWiki tool with the bare keyword (e.g. repo_url="vue") |
| 3 | Tool detects bare keyword and triggers MCP Elicitation (if multiple ambiguous repos found) |
| 4 | VS Code shows a selection prompt: “Multiple repositories match 'vue'. Which do you want?” |
| 5 | User selects the desired repo (e.g. vuejs/core for Vue 3) |
| 6 | Response includes resolution note: > Resolved: keyword "vue" → vuejs/core (52,900★) |
| 7 | Response shows top alternative candidates |
| 8 | The rest of the response contains normal CodeWiki documentation |
- Canonical match: “openclaw” →
openclaw/openclaw(owner == repo == keyword) - Single result: only one repo found → auto-selected
Fallback: If elicitation is unavailable (client doesn’t support it), heuristic selection by star count is used.
Validation Checklist
- Bare keyword “vue” triggers elicitation with multiple options (vuejs/vue, vuejs/core, etc.)
- User can select
vuejs/core(Vue 3) instead of auto-pickingvuejs/vue(Vue 2) - Bare keyword “openclaw” auto-resolves to
openclaw/openclaw(canonical match, NO elicitation) - Bare keyword “react” triggers elicitation showing facebook/react and alternatives
- Resolution note appears at the top of the response with star count
- Alternative candidates are listed
- Declining/cancelling elicitation falls back to heuristic selection
owner/repoformat still works as before (no resolution note, no elicitation)- Full URLs still work as before (no resolution note, no elicitation)
Full Workflow Scenario
This single prompt is designed to trigger all 5 workflow steps (Discover → Navigate → Read → Search → Synthesize) of the Researcher subagent.
Multi-Step Scenario Prompt
@codewiki I want a deep technical explanation of the React Compiler's
compilation pipeline.
Specifically:
1. First, check what topics Google CodeWiki has for facebook/react.
2. Then look at the full table of contents to find sections about
the compiler.
3. Read the section on "React Compiler Internals" to understand the
multi-stage compilation pipeline, the IRs (HIR and ReactiveFunction),
and the key optimization passes.
4. Search for "How does the React Compiler handle memoization and
reactive scopes?" to get implementation-level details.
5. Combine everything into a single technical summary that covers:
- The overall compilation pipeline (AST → HIR → ReactiveFunction → codegen)
- The key intermediate representations and their purpose
- How reactive scopes are inferred and merged
- How the compiler replaces manual useMemo/useCallback
- Cite which CodeWiki sections your answer comes from
Expected Tool Calls
| Step | Phase | Tool Call | Purpose |
|---|---|---|---|
| 1 | Discover | codewiki_list_topics("facebook/react") | Verify wiki exists, see available topics |
| 2 | Navigate | codewiki_read_structure("facebook/react") | Get full ToC with section hierarchy |
| 3 | Read | codewiki_read_contents("facebook/react", "React Compiler Internals") | Fetch detailed compiler pipeline docs |
| 4 | Search | codewiki_search_wiki("facebook/react", "How does the React Compiler handle memoization?") | Get Gemini-powered implementation details |
| 5 | Synthesize | (no tool call — agent combines results) | Produce cited summary from steps 1-4 |
Full Workflow Validation
- Step 1: Returns
status: "ok"with topic list (expect ~26 sections) - Step 2: Returns
status: "ok"with hierarchical section structure - Step 3: Returns
status: "ok"with detailed content about HIR, ReactiveFunction, compilation passes - Step 4: Returns
status: "ok"ORRETRY_EXHAUSTED(upstream timeout is a known CodeWiki issue) - Step 5: Agent produces a coherent summary citing specific CodeWiki sections
- All responses include
content_hashandidempotency_key - Subsequent identical calls return in <10ms (cache hit)
- Agent does NOT call the same tool >2 times for the same repo
Alternative Repos for Testing
If a repo is too large or slow during testing, try these. You can use bare keywords (v1.2.0+) — the server resolves them automatically:
| Input | Resolves To | Notes |
|---|---|---|
anthropics/anthropic-sdk-python | exact match | Fast wiki generation, good for quick tests |
fastapi | fastapi/fastapi | Bare keyword — well-indexed, good for architecture + review tests |
prophet | facebook/prophet | Bare keyword — good for researcher + review tests |
vscode | microsoft/vscode | Bare keyword — large repo, may be slower |
microsoft/vscode-copilot-chat | exact match | Microsoft tooling |
See also: Agentic AI Guide — full agent definitions, architecture, and lessons learned.