The MCP Tool Tax: How Four Coding Agents Solve the Same Problem Four Ways

The Bottom Line

As coding agents adopt the Model Context Protocol, it has become routine to wire up 5–15 MCP servers exposing 100–300 tools. Loading every tool schema on every turn imposes a steep tool tax: 50–140K tokens of input, slower startup, degraded tool-selection accuracy past roughly 30–50 tools, and — for some providers — hard rejections at the 128-tool API ceiling.

The four major coding agents have converged on the same architectural pattern — defer tool schemas, load on demand — but ship four different, non-interoperable implementations. The user-facing defaults differ enough to change which workloads each agent can handle without intervention.

Agent	Mechanism	Default state	Upfront cost at 200 tools
Claude Code	API-native `tool_search_tool`	On by default	~3–5K tokens
Cursor	Files synced to disk + grep	On by default	~2–4K tokens
GitHub Copilot	“Virtual tools” embedding clusters	Threshold-gated	~5–10K (clusters) or fail at 128
Codex CLI	None (manual allowlists only)	Eager	~50K tokens every turn

The pattern is becoming industry-standard. The wire formats are not. Until MCP adds a discovery primitive, each agent’s behavior at scale will diverge in the ways documented below.

The Problem: What a 200-Tool Catalog Actually Costs

Each MCP tool schema — name, description, JSON-Schema parameters — costs roughly 250–700 tokens depending on verbosity. Real-world measurements:

mcp-omnisearch (20 tools): ~14,114 tokens — ~706 tokens/tool
playwright MCP (21 tools): ~13,647 tokens — ~650 tokens/tool
Representative 10-server, 200-tool catalog: 50,000–140,000 tokens

Three compounding failure modes appear at scale:

Context displacement. On a 200K-token window, eager loading consumes 25–70% of context before the agent reads a single file.
Selection accuracy collapse. Models lose tool-selection accuracy past ~30–50 visible tools; with 200, they reliably pick the wrong tool or miss the right one.
Hard API limits. GPT-4.1 and current Copilot agent mode enforce a 128-tool ceiling; requests above are rejected before execution.

This is not a theoretical scaling concern. It is the dominant cost item in any multi-MCP setup today.

Implementation 1 — Claude Code: API-Native Tool Search

Anthropic ships tool_search_tool_regex_20251119 and tool_search_tool_bm25_20251119 as first-class API features. In Claude Code, MCP tools are deferred and discovered on demand by default; the agent sees only the search tool and any tools the user pinned.

The flow:

The agent decides it needs a capability.
It issues a regex or BM25 query against the registered tool index.
The API returns 3–5 matching tool_reference blocks (name + brief).
The chosen reference expands into a full schema only at the moment of use.

Limits and caveats worth knowing before you trust the default:

Catalog ceiling: 10,000 tools.
Tool descriptions and server instructions truncated at 2 KB each — silently.
Disabled by default on Vertex AI and non-first-party proxy hosts (the tool_search beta header is not forwarded).
Requires Sonnet 4+ or Opus 4+; Haiku models do not support tool search.

For 200 tools, this collapses the upfront cost from ~50K tokens to ~3–5K — a >90% reduction. Anthropic’s published figure is >85% across typical multi-server setups.

Implementation 2 — Cursor: Files-as-Discovery-Surface

Cursor explicitly rejected the dedicated-search-tool approach and instead syncs MCP tool descriptions to a folder on disk. The agent receives only tool names in static context and discovers full schemas by grep-ing or semantic-searching the synced folder when a task calls for it.

This is part of a broader “everything is a file” strategy: Cursor applies the same primitive to Agent Skills, long terminal output, and oversized tool responses. The file surface unlocks capabilities the API-level approach cannot — notably, surfacing MCP server status (re-authentication needed, server unhealthy) without forgetting the tools entirely.

A/B test results published January 2026 reported a 46.9% reduction in total agent tokens on runs that called an MCP tool. Schema-only reduction is higher; the 46.9% is across all session content.

Implementation 3 — GitHub Copilot: Virtual Tools

Copilot took a third path: embedding-guided clustering. Functionally similar tools are grouped using an internal embedding model and cosine similarity, with each cluster summarized by a single model call (cached locally). The agent sees the cluster summaries and expands a cluster only when its task matches.

In published benchmarks (SWE-Lancer, SWEbench-Verified), this approach improved success rates by 2–5 percentage points with both GPT-5 and Sonnet 4.5, and cut response latency by ~400ms.

Two constraints that make this approach more conditional than the others:

128-tool hard cap. Even with virtual tools enabled, VS Code 1.109 still enforces the 128-tool API limit at request time. Multi-MCP setups exceeding 128 are rejected before the agent can act, regardless of clustering. This is an open issue as of January 2026.
Threshold-gated. Virtual tools activate when the github.copilot.chat.virtualTools.threshold setting is crossed; below threshold, tools are still loaded eagerly.

Implementation 4 — OpenAI Codex CLI: No Dynamic Loading

Codex CLI exposes MCP tools alongside built-ins on session start, with no deferral mechanism. The OpenAI API supports tool_search with defer_loading: true on GPT-5.4+, but Codex CLI does not wire it up to MCP tools as of the current release.

Available controls are static:

enabled_tools = […] and disabled_tools = […] allow/deny lists per server.
enabled = false to disable a server without removing config.
Project-scoped .codex/config.toml for narrower scoping.

Known scaling issues with many servers:

MCP startup and tool discovery sit on the first-turn critical path. One slow or unhealthy server can stall the session until timeout.
Sub-agents inherit the full parent tool set with no scoping. On GPT-4.1 (128-tool API limit), spawning sub-agents in any environment with 150+ tools fails outright. Workarounds require disabling servers globally.

For 200 tools, Codex CLI pays the full ~50K-token cost on every turn. Across a 10-turn session, that’s ~500K tokens of pure tool-definition overhead.

Token Economics at 200 Tools

Assuming a representative MCP catalog with ~250 tokens per tool schema (conservative; verbose servers run 2–3× higher):

Agent	Upfront	Per-turn	10-turn overhead
Claude Code (`tool_search` on)	3–5K	3–5K + ~500 per expansion	30–50K
Cursor (DCD on)	2–4K	2–4K + ~400 per file read	20–40K
GitHub Copilot (virtual tools, <128)	5–10K	5–10K + cluster expansions	50–100K
Codex CLI (no deferral)	50K	50K	500K

Key implications. On a 200K-token window, Codex CLI consumes 25% of context before any work starts; with verbose servers, this rises to 70%. The other three agents leave essentially all of context available for the actual task. Over a multi-turn session, Codex’s per-turn cost compounds linearly — what costs $X in the first turn costs ~$10X across ten turns.

Practical Guidance

For multi-MCP workloads (5+ servers, 100+ tools), the agent choice materially affects what’s possible:

Verify defaults are active.
- Claude Code: confirm ENABLE_TOOL_SEARCH is not disabled (it silently turns off on proxies and Vertex AI).
- Copilot: set github.copilot.chat.virtualTools.threshold low enough to activate.
- Cursor: dynamic context discovery is on by default — no action needed.
For Codex CLI: scope aggressively with project-level .codex/config.toml and enabled_tools allowlists. Treat global MCP config as a superset; only enable per-project what each session actually needs. Expect 30–50 active tools max for healthy operation.
For Copilot at 128+ tools: the cap is enforced at the VS Code layer before the agent processes anything. Virtual tools clustering helps the model reason but does not bypass the API limit. Trim selected tools manually, or split work across sessions.
MCP server design: prefix tool names by server (matrix_query_*, quality_metric_*) so search and clustering have clean signal. Keep tool descriptions concise — Claude Code truncates anything past 2 KB silently.

Where the Standard Is Heading

All four implementations solve the same problem and converge on the same architectural shape: a small persistent surface (search tool, file index, cluster summaries) plus on-demand schema expansion. None of the wire formats interoperate. MCP itself ships notifications/tools/list_changed for dynamic catalogs but no standardized search or defer primitive that clients and servers agree on.

The pattern is industry-standard. The interface is not. Anthropic’s tool_search_tool_*, OpenAI’s tool_search + defer_loading, Cursor’s file sync, and Copilot’s virtual tools each remain vendor-specific. Until MCP adds a discovery primitive, each agent’s behavior at scale will diverge in the ways documented here — and the choice of agent will continue to be, in part, a choice of how much context tax you’re willing to pay per turn.

Sources

Anthropic: tool_search_tool documentation, Claude Code MCP docs
Cursor blog: “Dynamic context discovery” (January 2026)
GitHub blog: “How we’re making GitHub Copilot smarter with fewer tools” (November 2025)
OpenAI Developers: Codex CLI MCP reference, tool_search API guide
Real-world MCP tool-cost measurements: Scott Spence, October 2025
VS Code issue #290356 (128-tool cap, January 2026)
GitHub Copilot CLI issue #2992 (sub-agent tool scoping, April 2026)
Codex CLI issue #21318 (MCP startup blocking, May 2026)