Issue 008 · May 2026

The MCP Tool Tax: How Four Coding Agents Solve the Same Problem Four Ways

Loading 200 MCP tool schemas upfront costs 50–140K tokens, breaks selection accuracy past ~30–50 tools, and slams into a hard 128-tool API ceiling on two of the four major providers. Claude Code, Cursor, Codex CLI, and GitHub Copilot agree on the architecture and disagree on every implementation detail.

10 min read AI Tooling MCP Comparative Study
tool_search Virtual Tools Token Economics 128-Tool Cap

The Bottom Line

As coding agents adopt the Model Context Protocol, it has become routine to wire up 5–15 MCP servers exposing 100–300 tools. Loading every tool schema on every turn imposes a steep tool tax: 50–140K tokens of input, slower startup, degraded tool-selection accuracy past roughly 30–50 tools, and — for some providers — hard rejections at the 128-tool API ceiling.

The four major coding agents have converged on the same architectural pattern — defer tool schemas, load on demand — but ship four different, non-interoperable implementations. The user-facing defaults differ enough to change which workloads each agent can handle without intervention.

Agent Mechanism Default state Upfront cost at 200 tools
Claude Code API-native tool_search_tool On by default ~3–5K tokens
Cursor Files synced to disk + grep On by default ~2–4K tokens
GitHub Copilot “Virtual tools” embedding clusters Threshold-gated ~5–10K (clusters) or fail at 128
Codex CLI None (manual allowlists only) Eager ~50K tokens every turn

The pattern is becoming industry-standard. The wire formats are not. Until MCP adds a discovery primitive, each agent’s behavior at scale will diverge in the ways documented below.

The Problem: What a 200-Tool Catalog Actually Costs

Each MCP tool schema — name, description, JSON-Schema parameters — costs roughly 250–700 tokens depending on verbosity. Real-world measurements:

Three compounding failure modes appear at scale:

  1. Context displacement. On a 200K-token window, eager loading consumes 25–70% of context before the agent reads a single file.
  2. Selection accuracy collapse. Models lose tool-selection accuracy past ~30–50 visible tools; with 200, they reliably pick the wrong tool or miss the right one.
  3. Hard API limits. GPT-4.1 and current Copilot agent mode enforce a 128-tool ceiling; requests above are rejected before execution.

This is not a theoretical scaling concern. It is the dominant cost item in any multi-MCP setup today.

Implementation 1 — Claude Code: API-Native Tool Search

Anthropic ships tool_search_tool_regex_20251119 and tool_search_tool_bm25_20251119 as first-class API features. In Claude Code, MCP tools are deferred and discovered on demand by default; the agent sees only the search tool and any tools the user pinned.

The flow:

  1. The agent decides it needs a capability.
  2. It issues a regex or BM25 query against the registered tool index.
  3. The API returns 3–5 matching tool_reference blocks (name + brief).
  4. The chosen reference expands into a full schema only at the moment of use.

Limits and caveats worth knowing before you trust the default:

For 200 tools, this collapses the upfront cost from ~50K tokens to ~3–5K — a >90% reduction. Anthropic’s published figure is >85% across typical multi-server setups.

Implementation 2 — Cursor: Files-as-Discovery-Surface

Cursor explicitly rejected the dedicated-search-tool approach and instead syncs MCP tool descriptions to a folder on disk. The agent receives only tool names in static context and discovers full schemas by grep-ing or semantic-searching the synced folder when a task calls for it.

This is part of a broader “everything is a file” strategy: Cursor applies the same primitive to Agent Skills, long terminal output, and oversized tool responses. The file surface unlocks capabilities the API-level approach cannot — notably, surfacing MCP server status (re-authentication needed, server unhealthy) without forgetting the tools entirely.

A/B test results published January 2026 reported a 46.9% reduction in total agent tokens on runs that called an MCP tool. Schema-only reduction is higher; the 46.9% is across all session content.

Implementation 3 — GitHub Copilot: Virtual Tools

Copilot took a third path: embedding-guided clustering. Functionally similar tools are grouped using an internal embedding model and cosine similarity, with each cluster summarized by a single model call (cached locally). The agent sees the cluster summaries and expands a cluster only when its task matches.

In published benchmarks (SWE-Lancer, SWEbench-Verified), this approach improved success rates by 2–5 percentage points with both GPT-5 and Sonnet 4.5, and cut response latency by ~400ms.

Two constraints that make this approach more conditional than the others:

Implementation 4 — OpenAI Codex CLI: No Dynamic Loading

Codex CLI exposes MCP tools alongside built-ins on session start, with no deferral mechanism. The OpenAI API supports tool_search with defer_loading: true on GPT-5.4+, but Codex CLI does not wire it up to MCP tools as of the current release.

Available controls are static:

Known scaling issues with many servers:

For 200 tools, Codex CLI pays the full ~50K-token cost on every turn. Across a 10-turn session, that’s ~500K tokens of pure tool-definition overhead.

Token Economics at 200 Tools

Assuming a representative MCP catalog with ~250 tokens per tool schema (conservative; verbose servers run 2–3× higher):

Agent Upfront Per-turn 10-turn overhead
Claude Code (tool_search on) 3–5K 3–5K + ~500 per expansion 30–50K
Cursor (DCD on) 2–4K 2–4K + ~400 per file read 20–40K
GitHub Copilot (virtual tools, <128) 5–10K 5–10K + cluster expansions 50–100K
Codex CLI (no deferral) 50K 50K 500K

Key implications. On a 200K-token window, Codex CLI consumes 25% of context before any work starts; with verbose servers, this rises to 70%. The other three agents leave essentially all of context available for the actual task. Over a multi-turn session, Codex’s per-turn cost compounds linearly — what costs $X in the first turn costs ~$10X across ten turns.

Practical Guidance

For multi-MCP workloads (5+ servers, 100+ tools), the agent choice materially affects what’s possible:

  1. Verify defaults are active.
    • Claude Code: confirm ENABLE_TOOL_SEARCH is not disabled (it silently turns off on proxies and Vertex AI).
    • Copilot: set github.copilot.chat.virtualTools.threshold low enough to activate.
    • Cursor: dynamic context discovery is on by default — no action needed.
  2. For Codex CLI: scope aggressively with project-level .codex/config.toml and enabled_tools allowlists. Treat global MCP config as a superset; only enable per-project what each session actually needs. Expect 30–50 active tools max for healthy operation.
  3. For Copilot at 128+ tools: the cap is enforced at the VS Code layer before the agent processes anything. Virtual tools clustering helps the model reason but does not bypass the API limit. Trim selected tools manually, or split work across sessions.
  4. MCP server design: prefix tool names by server (matrix_query_*, quality_metric_*) so search and clustering have clean signal. Keep tool descriptions concise — Claude Code truncates anything past 2 KB silently.

Where the Standard Is Heading

All four implementations solve the same problem and converge on the same architectural shape: a small persistent surface (search tool, file index, cluster summaries) plus on-demand schema expansion. None of the wire formats interoperate. MCP itself ships notifications/tools/list_changed for dynamic catalogs but no standardized search or defer primitive that clients and servers agree on.

The pattern is industry-standard. The interface is not. Anthropic’s tool_search_tool_*, OpenAI’s tool_search + defer_loading, Cursor’s file sync, and Copilot’s virtual tools each remain vendor-specific. Until MCP adds a discovery primitive, each agent’s behavior at scale will diverge in the ways documented here — and the choice of agent will continue to be, in part, a choice of how much context tax you’re willing to pay per turn.

Sources

Share → LinkedIn X Reddit Hacker News Copy link