The Bottom Line
As coding agents adopt the Model Context Protocol, it has become routine to wire up 5–15 MCP servers exposing 100–300 tools. Loading every tool schema on every turn imposes a steep tool tax: 50–140K tokens of input, slower startup, degraded tool-selection accuracy past roughly 30–50 tools, and — for some providers — hard rejections at the 128-tool API ceiling.
The four major coding agents have converged on the same architectural pattern — defer tool schemas, load on demand — but ship four different, non-interoperable implementations. The user-facing defaults differ enough to change which workloads each agent can handle without intervention.
| Agent | Mechanism | Default state | Upfront cost at 200 tools |
|---|---|---|---|
| Claude Code | API-native tool_search_tool |
On by default | ~3–5K tokens |
| Cursor | Files synced to disk + grep | On by default | ~2–4K tokens |
| GitHub Copilot | “Virtual tools” embedding clusters | Threshold-gated | ~5–10K (clusters) or fail at 128 |
| Codex CLI | None (manual allowlists only) | Eager | ~50K tokens every turn |
The pattern is becoming industry-standard. The wire formats are not. Until MCP adds a discovery primitive, each agent’s behavior at scale will diverge in the ways documented below.
The Problem: What a 200-Tool Catalog Actually Costs
Each MCP tool schema — name, description, JSON-Schema parameters — costs roughly 250–700 tokens depending on verbosity. Real-world measurements:
- mcp-omnisearch (20 tools): ~14,114 tokens — ~706 tokens/tool
- playwright MCP (21 tools): ~13,647 tokens — ~650 tokens/tool
- Representative 10-server, 200-tool catalog: 50,000–140,000 tokens
Three compounding failure modes appear at scale:
- Context displacement. On a 200K-token window, eager loading consumes 25–70% of context before the agent reads a single file.
- Selection accuracy collapse. Models lose tool-selection accuracy past ~30–50 visible tools; with 200, they reliably pick the wrong tool or miss the right one.
- Hard API limits. GPT-4.1 and current Copilot agent mode enforce a 128-tool ceiling; requests above are rejected before execution.
This is not a theoretical scaling concern. It is the dominant cost item in any multi-MCP setup today.
Implementation 1 — Claude Code: API-Native Tool Search
Anthropic ships tool_search_tool_regex_20251119 and tool_search_tool_bm25_20251119 as first-class API features. In Claude Code, MCP tools are deferred and discovered on demand by default; the agent sees only the search tool and any tools the user pinned.
The flow:
- The agent decides it needs a capability.
- It issues a regex or BM25 query against the registered tool index.
- The API returns 3–5 matching
tool_referenceblocks (name + brief). - The chosen reference expands into a full schema only at the moment of use.
Limits and caveats worth knowing before you trust the default:
- Catalog ceiling: 10,000 tools.
- Tool descriptions and server instructions truncated at 2 KB each — silently.
- Disabled by default on Vertex AI and non-first-party proxy hosts (the
tool_searchbeta header is not forwarded). - Requires Sonnet 4+ or Opus 4+; Haiku models do not support tool search.
For 200 tools, this collapses the upfront cost from ~50K tokens to ~3–5K — a >90% reduction. Anthropic’s published figure is >85% across typical multi-server setups.
Implementation 2 — Cursor: Files-as-Discovery-Surface
Cursor explicitly rejected the dedicated-search-tool approach and instead syncs MCP tool descriptions to a folder on disk. The agent receives only tool names in static context and discovers full schemas by grep-ing or semantic-searching the synced folder when a task calls for it.
This is part of a broader “everything is a file” strategy: Cursor applies the same primitive to Agent Skills, long terminal output, and oversized tool responses. The file surface unlocks capabilities the API-level approach cannot — notably, surfacing MCP server status (re-authentication needed, server unhealthy) without forgetting the tools entirely.
A/B test results published January 2026 reported a 46.9% reduction in total agent tokens on runs that called an MCP tool. Schema-only reduction is higher; the 46.9% is across all session content.
Implementation 3 — GitHub Copilot: Virtual Tools
Copilot took a third path: embedding-guided clustering. Functionally similar tools are grouped using an internal embedding model and cosine similarity, with each cluster summarized by a single model call (cached locally). The agent sees the cluster summaries and expands a cluster only when its task matches.
In published benchmarks (SWE-Lancer, SWEbench-Verified), this approach improved success rates by 2–5 percentage points with both GPT-5 and Sonnet 4.5, and cut response latency by ~400ms.
Two constraints that make this approach more conditional than the others:
- 128-tool hard cap. Even with virtual tools enabled, VS Code 1.109 still enforces the 128-tool API limit at request time. Multi-MCP setups exceeding 128 are rejected before the agent can act, regardless of clustering. This is an open issue as of January 2026.
- Threshold-gated. Virtual tools activate when the
github.copilot.chat.virtualTools.thresholdsetting is crossed; below threshold, tools are still loaded eagerly.
Implementation 4 — OpenAI Codex CLI: No Dynamic Loading
Codex CLI exposes MCP tools alongside built-ins on session start, with no deferral mechanism. The OpenAI API supports tool_search with defer_loading: true on GPT-5.4+, but Codex CLI does not wire it up to MCP tools as of the current release.
Available controls are static:
enabled_tools = […]anddisabled_tools = […]allow/deny lists per server.enabled = falseto disable a server without removing config.- Project-scoped
.codex/config.tomlfor narrower scoping.
Known scaling issues with many servers:
- MCP startup and tool discovery sit on the first-turn critical path. One slow or unhealthy server can stall the session until timeout.
- Sub-agents inherit the full parent tool set with no scoping. On GPT-4.1 (128-tool API limit), spawning sub-agents in any environment with 150+ tools fails outright. Workarounds require disabling servers globally.
For 200 tools, Codex CLI pays the full ~50K-token cost on every turn. Across a 10-turn session, that’s ~500K tokens of pure tool-definition overhead.
Token Economics at 200 Tools
Assuming a representative MCP catalog with ~250 tokens per tool schema (conservative; verbose servers run 2–3× higher):
| Agent | Upfront | Per-turn | 10-turn overhead |
|---|---|---|---|
Claude Code (tool_search on) |
3–5K | 3–5K + ~500 per expansion | 30–50K |
| Cursor (DCD on) | 2–4K | 2–4K + ~400 per file read | 20–40K |
| GitHub Copilot (virtual tools, <128) | 5–10K | 5–10K + cluster expansions | 50–100K |
| Codex CLI (no deferral) | 50K | 50K | 500K |
Key implications. On a 200K-token window, Codex CLI consumes 25% of context before any work starts; with verbose servers, this rises to 70%. The other three agents leave essentially all of context available for the actual task. Over a multi-turn session, Codex’s per-turn cost compounds linearly — what costs $X in the first turn costs ~$10X across ten turns.
Practical Guidance
For multi-MCP workloads (5+ servers, 100+ tools), the agent choice materially affects what’s possible:
- Verify defaults are active.
- Claude Code: confirm
ENABLE_TOOL_SEARCHis not disabled (it silently turns off on proxies and Vertex AI). - Copilot: set
github.copilot.chat.virtualTools.thresholdlow enough to activate. - Cursor: dynamic context discovery is on by default — no action needed.
- Claude Code: confirm
- For Codex CLI: scope aggressively with project-level
.codex/config.tomlandenabled_toolsallowlists. Treat global MCP config as a superset; only enable per-project what each session actually needs. Expect 30–50 active tools max for healthy operation. - For Copilot at 128+ tools: the cap is enforced at the VS Code layer before the agent processes anything. Virtual tools clustering helps the model reason but does not bypass the API limit. Trim selected tools manually, or split work across sessions.
- MCP server design: prefix tool names by server (
matrix_query_*,quality_metric_*) so search and clustering have clean signal. Keep tool descriptions concise — Claude Code truncates anything past 2 KB silently.
Where the Standard Is Heading
All four implementations solve the same problem and converge on the same architectural shape: a small persistent surface (search tool, file index, cluster summaries) plus on-demand schema expansion. None of the wire formats interoperate. MCP itself ships notifications/tools/list_changed for dynamic catalogs but no standardized search or defer primitive that clients and servers agree on.
The pattern is industry-standard. The interface is not. Anthropic’s tool_search_tool_*, OpenAI’s tool_search + defer_loading, Cursor’s file sync, and Copilot’s virtual tools each remain vendor-specific. Until MCP adds a discovery primitive, each agent’s behavior at scale will diverge in the ways documented here — and the choice of agent will continue to be, in part, a choice of how much context tax you’re willing to pay per turn.
Sources
- Anthropic:
tool_search_tooldocumentation, Claude Code MCP docs - Cursor blog: “Dynamic context discovery” (January 2026)
- GitHub blog: “How we’re making GitHub Copilot smarter with fewer tools” (November 2025)
- OpenAI Developers: Codex CLI MCP reference,
tool_searchAPI guide - Real-world MCP tool-cost measurements: Scott Spence, October 2025
- VS Code issue #290356 (128-tool cap, January 2026)
- GitHub Copilot CLI issue #2992 (sub-agent tool scoping, April 2026)
- Codex CLI issue #21318 (MCP startup blocking, May 2026)