
Pin one MCP per system, then write thin skills on top
Chris Harper
1 min read
Jun 9, 2026 · 12:30 UTC
A clean architectural pattern is hardening for managing context cost. MCP tool definitions are expensive: connecting three servers (GitHub, Slack, Sentry) can load 55,000 tokens of tool definitions before the agent reads its first message, and one documented setup hit 143,000 tokens — 72% of a 200K window — before doing any work. Benchmarks also show MCP costing 4–32× more tokens per operation than an equivalent CLI call.
Bigger 1M-token windows give margin but don't change the multiplier. The durable fix is progressive disclosure: each query returns only what the agent needs for its current decision, with full records available on request. The practical shape — pin one MCP per external system you genuinely need live (GitHub, Postgres, Linear, Sentry), and push orchestration logic into thin skills rather than connecting yet another server.
Sources: Roadie: Why your MCP server is eating your context window, Claude Skills and MCP Servers: a practitioner's guide (Codersera), Red Hat Developer: MCP servers vs skills