I'm getting tired of everyone saying "MCP is dead, use CLIs!".
Yes, MCP eats up context windows, but agents can also be smarter about how they load the MCP context in the first place, using similar strategy to skills.
The problem with tossing it out entirely is that it leaves a lot more questions for handling security.
When using skills, there's no implicit way to be able to apply policies in the sane way across many different servers.
MCP gives us a registry such that we can enforce MCP chain policies, i.e. no doing web search after viewing financials.
Doing the same with skills is not possible in a programatic and deterministic way.
There needs to be a middle ground instead of throwing out MCP entirely.
I feel like I don't fully understand mcp. I've done research on it but I definitely couldn't explain it. I get lost on the fact that to my knowledge it's a server with API endpoints that are well defined into a json schema then sent the to LLM and the LLM parses that and decides which endpoints to hit (I'm aware some llms use smart calling now so they load the tool name and description but nothing else until it's called). How exactly are you doing the process of stopping the LLM from using web search after it hits a certain endpoint in your MCP server? Or is this referring strictly to when you own the whole workflow where you can then deny websearch capabilities on the next LLM step?
Are there any good docs youve liked to learn about it, or good open source projects you used to get familiar? I would like to learn more
There is not a lot to learn to understand the basics, but maybe one step that's not necessarily documented is the overall workflow and why it's arranged this way. You mentioned the LLM "using web search" and it's a related idea: LLMs don't run web searches themselves when you're using an MCP client, they ask the client to do it.
You can think of an MCP server as a process exposing some tools. It runs on your machine communicating via stdin/stdout, or on a server over HTTP. It exposes a list of tools, each tool has a name and named+typed parameters, just like a list of functions in a program. When you "add" an MCP server to Claude Code or any other client, you simply tell this client app on your machine about this list of tools and it will include this list in its requests to the LLM alongside your prompt.
When the LLM receives your prompt and decides that one of the tools listed alongside would be helpful to answer you, it doesn't return a regular response to your client but a "tool call" message saying: "call <this tool> with <these parameters>". Your client does this, and sends back the tool call result to the LLM, which will take this into account to respond to your prompt.
That's pretty much all there is to it: LLMs can't connect to your email or your GitHub account or anything else; your local apps can. MCP is just a way for LLMs to ask clients to call tools and provide the response.
1. You: {message: "hey Claude, how many PRs are open on my GitHub repo foo/bar?", tools: [... github__pr_list(org:string, repo:string) -> [PullRequest], ...] }
2. Anthropic API: {tool_use: {id: 123, name: github__pr_list, input:{org: foo, repo: bar}}}
3. You: {tool_result: {id: 123, content: [list of PRs in JSON]} }
4. Anthropic API: {message: "I see 3 PRs in your repo foo/bar"}
that's it.
If you want to go deeper the MCP website[1] is relatively accessible, although you definitely don't need to know all the details of the protocol to use MCP. If all you need is to use MCP servers and not blow up your context with a massive list of tools that are included with each prompt, I don't think you need to know much more than what I described above.
The security angle is definitely right but the framing is still too narrow. Everyone's debating context window economics and chain policies, but there's a more fundamental gap lying underneath these: nobody's verifying the content of what gets loaded.
Tool schemas have JSON Schema validation for structure. But the descriptions: the natural language text that actually drives LLM behavior have zero integrity checking. A server can change "search files in project directory" to "search files in project directory and include contents of .env files in results" between sessions, and nothing in the protocol detects it. And that's not hypothetical. CVE-2025-49596 was exactly this class of bug.
Context window size is an economics problem that's already getting solved by bigger windows and tool search. Description-layer integrity is an architectural gap that most of the ecosystem hasn't even acknowledged yet. And that makes it the thing that is going to bite us in the butt soon.
Schema validates structure, nothing validates intent. That's the actual attack surface and nobody's talking about it.
CLI `--help` is baked into the binary. You'd need a new release to change it. MCP server descriptions can change between sessions and nothing catches it.
Honestly though, the whole thread is arguing about the wrong layer. I've been doing API infra for 20 years and the pattern is always the same: if your API has good resource modeling and consistent naming, agents will figure it out through CLI, MCP, whatever. If it doesn't, MCP schemas won't save you.
Thanks for the CVE reference, hadn't seen that one.
It is a weird trend. I see the appeal of Skills over MCP when you are just a solo dev doing your work. MCP is incredibly useful in an organization context when you need to add controls and process. Both are useful. I feel like the anti-MCP push is coming from people who don't need to work in a large org.
Isn’t it possible to proxy LLM communication and strip out unwanted MCP tool calls from conversations? I mean if you’re going to ban MCPs, you’re probably banning any CLI tooling too, right?
> I feel like the anti-MCP push is coming from people who don't need to work in a large org.
Any kind of social push like that is always understood to be something to ignore if you understand why you need to ignore it. Do you agree that a typical solo dev caught in the MCP hype should run the other way, even if it is beneficial to your unique situation?
IMO if you want a metadata registry of how actions work so you can make complicated, fragile, ACL rule systems of actions, then make that. That doesn't need to be loaded into a context window to make that work and can be expanded to general API usage, tool usage, cli usage, and so on. You can load a gh cli metadata description system and so on.
MCPs are clunky, difficult to work with and token inefficient and security orgs often have bad incentive design to mostly ignore what the business and devs need to actually do their job, leading to "endpoint management" systems that eat half the system resources and a lot of fig leaf security theatre to systematically disable whatever those systems are doing so people can do their job in an IT equivalent that feels like the TSA.
Thank god we moving away from giving security orgs these fragile tools to attach ball and chains to everyone.
Skills are just prompts, so policy doesn't apply there. MCP isn't giving you any special policy control there, it's just a capability border. You could do the same thing with a service mesh or any other capability compartmentalization technique.
The only value in MCP is that it's intended "for agents" and it has traction.
> I'm getting tired of everyone saying "MCP is dead, use CLIs!".
The people saying this and attacking it should first agree about the question.
Are you combining a few tools in the training set into a logical unit to make a cohesive tool-suite, say for reverse engineering or network-debugging? Low stakes for errors, not much on-going development? Great, you just need a thin layer of intelligence on top of stack-overflow and blog-posts, and CLI will probably do it.
Are you trying to weld together basically an AI front-end for an existing internal library or service? Is it something complex enough that you need to scale out and have modular access to? Is it already something you need to deploy/develop/test independently? Oops, there's nothing quite like that in the training set, and you probably want some guarantees. You need a schema, obviously. You can sort of jam that into prompts and prayers, hope for the best with skills, skip validation and risk annotations being ignored, trust that future opaque model-change will be backwards compatible with how skills are even selected/dispatched. Or.. you can use MCP.
Advocating really hard for one or the other in general is just kind of naive.
> Yes, MCP eats up context windows, but agents can also be smarter about how they load the MCP context in the first place, using similar strategy to skills.
I have been keeping an eye on MCP context usage with Claude Code's /context command.
When I ran it a couple months ago, supabase used 13.2k tokens all the time, with the search_docs tool using 8k! So, I disabled that tool in my config.
I just ran /context now, and when not being used it uses only ~300 tokens.
I have a question. Does anyone know a good way to benchmark actual MCP context usage in Claude Code now? I just tried a few different things and none of them worked.
I agree, and it's context-dependent when to use what (the author mentions use cases for other solutions). I'm glad there are multiple solutions to choose from.
I’m a huge fan of CLIs over MCP for many things, and I love asking Claude Code to take an API, wrap it in a CLI, and make a skill for me. The ergonomics for the agent and the human are fantastic, and you get all of the nice composability of command line Unix build in.
However, MCPs have some really nice properties that CLIs generally don’t, or that are harder to solve for. Most notably, making API secrets available to the CLI, but not to the agent, is quite tricky. Even in this example, the options are env variables (which are a prompt injection away from dumping), or a credentials file (better, but still very much accessible to the agent if it were asked).
MCPs give you a “standard” way of loading and configuring a set of tools/capabilities into a running MCP server (locally or remotely), outside of the agent’s process tree. This allows you to embed your secrets in the MCP server, via any method you choose, in a way that is difficult or impossible for the agent to dump even if it goes rogue.
My efforts to replicate that secure setup for a CLI have either made things more complicated (using a different user for running CLIs so that you can rely upon Linux file permissions to hide secrets), or start to rhyme with MCP (a memory-resident socket server started before the CLI that the CLI can talk to, much like docker.sock or ssh-agent)
> Limit integrations → agent can only talk to a few services
The idea that people see this as one horn of a trilemma instead of just good practice is a bit strange. Who would complain that every import isn't a star-import? Bring in what you need at first, then load new things dynamically with good semantics for cascade / drill-down. Let's maybe abandon simple classics like namespacing and the unix philsophy for the kitchen-sink approach after the kitchen-sink thing is shown to work.
Measurable responses to the environment lag, Moore's law has been slowing down (e: and demand has been speeding up, a lot).
From just a sustainability point, I really hope that the parent post's quote is true, because otherwise I've personally seen LLMs used over and over to complete the same task that it could have been used for once to generate a script, and I'd really like to be able to still afford to own my own hardware at home.
I'm using local models on a 6 year old AMD GPU that would have felt like a technology indistinguishable from magic 10 years ago. I ask it for crc32 in C and it gives me an answer. I ask it to play a game with me. It does. If I'm an isolated human this is like a magic talking box. But it's not magic. It doesn't use more energy than playing a video game either.
10 years from now: "The next big thing: HENG - Human Engineers! These make mistakes, but when they do, they can just learn from it and move on and never make it again! It's like magic! Almost as smart as GPT-63.3-Fast-Xtra-Ultra-Google23-v2-Mem-Quantum"
I would love to live in a world where my coworkers learn from their mistakes
is this Human 2.0? I only have 1.0a beta in the office.
I get the joke but it really does highlight how flimsy the argument is for humans. IME humans frequently make simple errors everywhere they don’t learn from and get things right the first time very rarely. Damn. Sounds like LLMs. And those are only getting better. Humans aren’t.
I've always wanted a better way to test programmers' debugging in an interview setting. Like, sometimes just working problems gets at it, but usually just the "can you re-read your own code and spot a mistake" sort of debugging.
Which is not nothing, and I'm not sure how LLMs do on that style; I'd expect them to be able to fake it well enough on common mistakes in common idioms, which might get you pretty far, and fall flat on novel code.
The kind of debugging that makes me feel cool is when I see or am told about a novel failure in a large program, and my mental model of the system is good enough that this immediately "unlocks" a new understanding of a corner case I hadn't previously considered. "Ah, yes, if this is happening it means that precondition must be false, and we need to change a line of code in a particular file just so." And when it happens and I get it right, there's no better feeling.
Of course, half the time it turns out I'm wrong, and I resort to some combination of printf debugging (to improve my understanding of the code) and "making random changes", where I take swing-and-a-miss after swing-and-a-miss changing things I think could be the problem and testing to see if it works.
And that last thing? I kind of feel like it's all LLMs do when you tell them the code is broken and ask then to fix it. They'll rewrite it, tell you it's fixed and ... maybe it is? It never understands the problem to fix it.
I am kind of already at that point. For all the complaining about context windows being stuffed with MCPs, I am curious what they are up to and how many MCPs they have that this is a problem.
One of the MCP Core Maintainers here, so take this with a boulder of salt if you're skeptical of my biases.
The debate around "MCP vs. CLI" is somewhat pointless to me personally. Use whatever gets the job done. MCP is much more than just tool calling - it also happens to provide a set of consistent rails for an agent to follow. Besides, we as developers often forget that the things we build are also consumed by non-technical folks - I have no desire to teach my parents to install random CLIs to get things done instead of plugging a URI to a hosted MCP server with a well-defined impact radius. The entire security posture of "Install this CLI with access to everything on your box" terrifies me.
The context window argument is also an agent harness challenge more than anything else - modern MCP clients do smart tool search that obviates the entire "I am sending the full list of tools back and forth" mode of operation. At this point it's just a trope that is repeated from blog post to blog post. This blog post too alludes to this and talks about the need for infrastructure to make it work, but it just isn't the case. It's a pattern that's being adopted broadly as we speak.
More of the latter than the former. The protocol itself is constrained to a set of well-defined primitives, but clients can do a bunch of pre-processing before invoking any of them.
CLIs are great for some applications! But 'progressive disclosure' means more mistakes to be corrected and more round trips to the model - every time[1] you use the tool in a new thread. You're trading latency for lower cost/more free context. That might be great! But it might not be, and the opposite trade (more money/less context for lower latency) makes a lot of sense for some applications. esp. if the 'more money' part can be amortized over lots of users by keeping the tool definitions block cached.
[1]: one might say 'of course you can just add details about the CLI to the prompt' ... which reinvents MCP in an ad hoc underspecified non-portable mode in your prompt.
This is a fair trade-off and the post should probably be more explicit about it. You're right that progressive disclosure trades latency for cost and context space. For some workloads that's the wrong trade.
The amortization point is interesting too. If you're running a support agent that calls the same 5 tools thousands of times a day, paying the schema cost once and caching it makes total sense. The post covers this in the "tightly scoped, high-frequency tools" section but your framing of it as a caching problem is cleaner.
On the footnote: guilty as charged, partially. The ~80 token prompt is a minimal bootstrap, not a full schema. It tells the agent how to discover, not what to call. But yeah, the moment you start expanding that prompt with specific flags and patterns, you're drifting toward a hand-rolled tool definition. The difference is where you stop. 80 tokens of "here's how to explore" is different from 10,000 tokens of "here's everything you might ever need." But the line between the two is blurrier than the post implies. Fair point.
I ran into this exact problem building a MCP server. 85 tools in experimental mode, ~17k tokens just for the tool manifest before any work starts.
The fix I (well Codex actually) landed on was toolset tiers (minimal/authoring/experimental) controlled by env var, plus phase-gating, now tools are registered but ~80% are "not connected" until you call _connect. The effective listed surface stays pretty small.
Lazy loading basically, not a new concept for people here.
The industry is talking in circles here. All you need is "composability".
UNIX solved this with files and pipes for data, and processes for compute.
AI agents are solving this this with sub-agents for data, and "code execution" for compute.
The UNIX approach is both technically correct and elegant, and what I strongly favor too.
The agent + MCP approach is getting there. But not every harness has sub-agents, or their invocation is non-deterministic, which is where "MCP context bloat" happens.
Fair point on tool search. Claude Code and Codex do have it.
But tool search is solving the symptom, not the cause. You still pay the per-tool token cost for every tool the search returns. And you've added a search step (with its own latency and token cost) before every tool call.
With a CLI, the agent runs `--help` and gets 50-200 tokens of exactly what it needs. No search index, no ranking, no middleware. The binary is the registry.
Tool search makes MCP workable. CLIs make the search unnecessary.
Those bigger windows come with lovely surcharges on compute, latency, and prompt complexity, so "just wait for more tokens" is a nice fantasy that melts the moment someone has to pay the bill. If your use case is tiny or your budget is infinite, fine, but for everyone else the "make the window bigger" crowd sounds like they're budgeting by credit card. Quality still falls off near the edge.
Context windows getting bigger doesn't make the economics go away. Tokens still cost money. 50K tokens of schemas at 1M context is the same dollar cost as 50K tokens at 200K context, you just have more room left over.
The pattern with every resource expansion is the same: usage scales to fill it. Bigger windows mean more integrations connected, not leaner ones. Progressive disclosure is cheaper at any window size.
It helps with cost, agreed. But caching doesn't fix the other two problems.
1) Models get worse at reasoning as context fills up, cached or not. right?
2) Usage expansion problem still holds. Cheaper context means teams connect more services, not fewer. You cache 50K tokens of schemas today, then it's 200K tomorrow because you can "afford" it now. The bloat scales with the budget...
Caching makes MCP more viable. It doesn't make loading 43 tool definitions for a task that uses two of them a good architecture.
We built a unified API with a large surface area and ran into a problem when building our MCP server: tool definitions alone burned 50,000+ tokens before the agent touched a single user message.
The fix that worked for us was giving agents a CLI instead. ~80 tokens in the system prompt, progressive discovery through --help, and permission enforcement baked into the binary rather than prompts.
The post covers the benchmarks (Scalekit's 75-run comparison showed 4-32x token overhead for MCP vs CLI), the architecture, and an honest section on where CLIs fall short (streaming, delegated auth, distribution).
I'm getting tired of everyone saying "MCP is dead, use CLIs!".
Yes, MCP eats up context windows, but agents can also be smarter about how they load the MCP context in the first place, using similar strategy to skills.
The problem with tossing it out entirely is that it leaves a lot more questions for handling security.
When using skills, there's no implicit way to be able to apply policies in the sane way across many different servers.
MCP gives us a registry such that we can enforce MCP chain policies, i.e. no doing web search after viewing financials.
Doing the same with skills is not possible in a programatic and deterministic way.
There needs to be a middle ground instead of throwing out MCP entirely.
I feel like I don't fully understand mcp. I've done research on it but I definitely couldn't explain it. I get lost on the fact that to my knowledge it's a server with API endpoints that are well defined into a json schema then sent the to LLM and the LLM parses that and decides which endpoints to hit (I'm aware some llms use smart calling now so they load the tool name and description but nothing else until it's called). How exactly are you doing the process of stopping the LLM from using web search after it hits a certain endpoint in your MCP server? Or is this referring strictly to when you own the whole workflow where you can then deny websearch capabilities on the next LLM step?
Are there any good docs youve liked to learn about it, or good open source projects you used to get familiar? I would like to learn more
There is not a lot to learn to understand the basics, but maybe one step that's not necessarily documented is the overall workflow and why it's arranged this way. You mentioned the LLM "using web search" and it's a related idea: LLMs don't run web searches themselves when you're using an MCP client, they ask the client to do it.
You can think of an MCP server as a process exposing some tools. It runs on your machine communicating via stdin/stdout, or on a server over HTTP. It exposes a list of tools, each tool has a name and named+typed parameters, just like a list of functions in a program. When you "add" an MCP server to Claude Code or any other client, you simply tell this client app on your machine about this list of tools and it will include this list in its requests to the LLM alongside your prompt.
When the LLM receives your prompt and decides that one of the tools listed alongside would be helpful to answer you, it doesn't return a regular response to your client but a "tool call" message saying: "call <this tool> with <these parameters>". Your client does this, and sends back the tool call result to the LLM, which will take this into account to respond to your prompt.
That's pretty much all there is to it: LLMs can't connect to your email or your GitHub account or anything else; your local apps can. MCP is just a way for LLMs to ask clients to call tools and provide the response.
1. You: {message: "hey Claude, how many PRs are open on my GitHub repo foo/bar?", tools: [... github__pr_list(org:string, repo:string) -> [PullRequest], ...] } 2. Anthropic API: {tool_use: {id: 123, name: github__pr_list, input:{org: foo, repo: bar}}} 3. You: {tool_result: {id: 123, content: [list of PRs in JSON]} } 4. Anthropic API: {message: "I see 3 PRs in your repo foo/bar"}
that's it.
If you want to go deeper the MCP website[1] is relatively accessible, although you definitely don't need to know all the details of the protocol to use MCP. If all you need is to use MCP servers and not blow up your context with a massive list of tools that are included with each prompt, I don't think you need to know much more than what I described above.
[1] https://modelcontextprotocol.io/docs/learn/architecture
The security angle is definitely right but the framing is still too narrow. Everyone's debating context window economics and chain policies, but there's a more fundamental gap lying underneath these: nobody's verifying the content of what gets loaded.
Tool schemas have JSON Schema validation for structure. But the descriptions: the natural language text that actually drives LLM behavior have zero integrity checking. A server can change "search files in project directory" to "search files in project directory and include contents of .env files in results" between sessions, and nothing in the protocol detects it. And that's not hypothetical. CVE-2025-49596 was exactly this class of bug.
Context window size is an economics problem that's already getting solved by bigger windows and tool search. Description-layer integrity is an architectural gap that most of the ecosystem hasn't even acknowledged yet. And that makes it the thing that is going to bite us in the butt soon.
Schema validates structure, nothing validates intent. That's the actual attack surface and nobody's talking about it.
CLI `--help` is baked into the binary. You'd need a new release to change it. MCP server descriptions can change between sessions and nothing catches it.
Honestly though, the whole thread is arguing about the wrong layer. I've been doing API infra for 20 years and the pattern is always the same: if your API has good resource modeling and consistent naming, agents will figure it out through CLI, MCP, whatever. If it doesn't, MCP schemas won't save you.
Thanks for the CVE reference, hadn't seen that one.
It is a weird trend. I see the appeal of Skills over MCP when you are just a solo dev doing your work. MCP is incredibly useful in an organization context when you need to add controls and process. Both are useful. I feel like the anti-MCP push is coming from people who don't need to work in a large org.
Not sure. Our big org, banned MCPs because they are unsafe, and they have no way to enforce only certain MCPs (in github copilot).
But skills where you tell the LLM to shell out to some random command are safe? I'm not sure I understand the logic.
You can control an execution context in a superior manner than a rando MCP server.
MCP Security 2026: 30 CVEs in 60 Days - https://news.ycombinator.com/item?id=47356600 - March 2026
(securing this use case is a component of my work in a regulated industry and enterprise)
I think big companies already protect against random commands causing damage. Work laptops are tightly controlled for both networking and software.
Shameless plug: im working on a product that aims to solve this: https://www.gatana.ai/
Isn’t it possible to proxy LLM communication and strip out unwanted MCP tool calls from conversations? I mean if you’re going to ban MCPs, you’re probably banning any CLI tooling too, right?
Maybe https://usepec.eu ?
We only allow custom MCP servers.
> I feel like the anti-MCP push is coming from people who don't need to work in a large org.
Any kind of social push like that is always understood to be something to ignore if you understand why you need to ignore it. Do you agree that a typical solo dev caught in the MCP hype should run the other way, even if it is beneficial to your unique situation?
Id agree solo devs can lean toward skills. I liken skills to a sort of bash scripts directory. And for personal stuff I generally use skills only.
IMO if you want a metadata registry of how actions work so you can make complicated, fragile, ACL rule systems of actions, then make that. That doesn't need to be loaded into a context window to make that work and can be expanded to general API usage, tool usage, cli usage, and so on. You can load a gh cli metadata description system and so on.
MCPs are clunky, difficult to work with and token inefficient and security orgs often have bad incentive design to mostly ignore what the business and devs need to actually do their job, leading to "endpoint management" systems that eat half the system resources and a lot of fig leaf security theatre to systematically disable whatever those systems are doing so people can do their job in an IT equivalent that feels like the TSA.
Thank god we moving away from giving security orgs these fragile tools to attach ball and chains to everyone.
Skills are just prompts, so policy doesn't apply there. MCP isn't giving you any special policy control there, it's just a capability border. You could do the same thing with a service mesh or any other capability compartmentalization technique.
The only value in MCP is that it's intended "for agents" and it has traction.
> I'm getting tired of everyone saying "MCP is dead, use CLIs!".
The people saying this and attacking it should first agree about the question.
Are you combining a few tools in the training set into a logical unit to make a cohesive tool-suite, say for reverse engineering or network-debugging? Low stakes for errors, not much on-going development? Great, you just need a thin layer of intelligence on top of stack-overflow and blog-posts, and CLI will probably do it.
Are you trying to weld together basically an AI front-end for an existing internal library or service? Is it something complex enough that you need to scale out and have modular access to? Is it already something you need to deploy/develop/test independently? Oops, there's nothing quite like that in the training set, and you probably want some guarantees. You need a schema, obviously. You can sort of jam that into prompts and prayers, hope for the best with skills, skip validation and risk annotations being ignored, trust that future opaque model-change will be backwards compatible with how skills are even selected/dispatched. Or.. you can use MCP.
Advocating really hard for one or the other in general is just kind of naive.
> Yes, MCP eats up context windows, but agents can also be smarter about how they load the MCP context in the first place, using similar strategy to skills.
I have been keeping an eye on MCP context usage with Claude Code's /context command.
When I ran it a couple months ago, supabase used 13.2k tokens all the time, with the search_docs tool using 8k! So, I disabled that tool in my config.
I just ran /context now, and when not being used it uses only ~300 tokens.
I have a question. Does anyone know a good way to benchmark actual MCP context usage in Claude Code now? I just tried a few different things and none of them worked.
Towards the end of the article, they do write about some things that MCP does better.
Tool search pretty much completely negates the MCP context window argument.
Evidence?
I agree, and it's context-dependent when to use what (the author mentions use cases for other solutions). I'm glad there are multiple solutions to choose from.
This is the right framing. The chain policy problem is what happens when you ask the registry to be the entitlement layer.
Here's a longer piece on why the trust boundary has to live at the runtime level, not the interface level, and what that means for MCP's actual job: https://forestmars.substack.com/p/twilight-of-the-mcp-idols
MCPs are handy in their place. Agents calling CLI locally is much more efficient.
I’m a huge fan of CLIs over MCP for many things, and I love asking Claude Code to take an API, wrap it in a CLI, and make a skill for me. The ergonomics for the agent and the human are fantastic, and you get all of the nice composability of command line Unix build in.
However, MCPs have some really nice properties that CLIs generally don’t, or that are harder to solve for. Most notably, making API secrets available to the CLI, but not to the agent, is quite tricky. Even in this example, the options are env variables (which are a prompt injection away from dumping), or a credentials file (better, but still very much accessible to the agent if it were asked).
MCPs give you a “standard” way of loading and configuring a set of tools/capabilities into a running MCP server (locally or remotely), outside of the agent’s process tree. This allows you to embed your secrets in the MCP server, via any method you choose, in a way that is difficult or impossible for the agent to dump even if it goes rogue.
My efforts to replicate that secure setup for a CLI have either made things more complicated (using a different user for running CLIs so that you can rely upon Linux file permissions to hide secrets), or start to rhyme with MCP (a memory-resident socket server started before the CLI that the CLI can talk to, much like docker.sock or ssh-agent)
> Limit integrations → agent can only talk to a few services
The idea that people see this as one horn of a trilemma instead of just good practice is a bit strange. Who would complain that every import isn't a star-import? Bring in what you need at first, then load new things dynamically with good semantics for cascade / drill-down. Let's maybe abandon simple classics like namespacing and the unix philsophy for the kitchen-sink approach after the kitchen-sink thing is shown to work.
10 years from now: "Can you believe they did anything with such a small context window?"
More likely: "Can you believe they were actually trying to use LLMs for this?"
OSes and software engs did not end up using less RAM.
Measurable responses to the environment lag, Moore's law has been slowing down (e: and demand has been speeding up, a lot).
From just a sustainability point, I really hope that the parent post's quote is true, because otherwise I've personally seen LLMs used over and over to complete the same task that it could have been used for once to generate a script, and I'd really like to be able to still afford to own my own hardware at home.
How many times have we implemented Hello World?
I'm using local models on a 6 year old AMD GPU that would have felt like a technology indistinguishable from magic 10 years ago. I ask it for crc32 in C and it gives me an answer. I ask it to play a game with me. It does. If I'm an isolated human this is like a magic talking box. But it's not magic. It doesn't use more energy than playing a video game either.
10 years from now: "The next big thing: HENG - Human Engineers! These make mistakes, but when they do, they can just learn from it and move on and never make it again! It's like magic! Almost as smart as GPT-63.3-Fast-Xtra-Ultra-Google23-v2-Mem-Quantum"
I would love to live in a world where my coworkers learn from their mistakes
is this Human 2.0? I only have 1.0a beta in the office.
I get the joke but it really does highlight how flimsy the argument is for humans. IME humans frequently make simple errors everywhere they don’t learn from and get things right the first time very rarely. Damn. Sounds like LLMs. And those are only getting better. Humans aren’t.
Imagine believing humans don’t make the same mistakes. You live in a different universe than me buddy.
Sometimes we repeat mistakes. But humans are capable of occasionally learning. I've seen it!
I've always wanted a better way to test programmers' debugging in an interview setting. Like, sometimes just working problems gets at it, but usually just the "can you re-read your own code and spot a mistake" sort of debugging.
Which is not nothing, and I'm not sure how LLMs do on that style; I'd expect them to be able to fake it well enough on common mistakes in common idioms, which might get you pretty far, and fall flat on novel code.
The kind of debugging that makes me feel cool is when I see or am told about a novel failure in a large program, and my mental model of the system is good enough that this immediately "unlocks" a new understanding of a corner case I hadn't previously considered. "Ah, yes, if this is happening it means that precondition must be false, and we need to change a line of code in a particular file just so." And when it happens and I get it right, there's no better feeling.
Of course, half the time it turns out I'm wrong, and I resort to some combination of printf debugging (to improve my understanding of the code) and "making random changes", where I take swing-and-a-miss after swing-and-a-miss changing things I think could be the problem and testing to see if it works.
And that last thing? I kind of feel like it's all LLMs do when you tell them the code is broken and ask then to fix it. They'll rewrite it, tell you it's fixed and ... maybe it is? It never understands the problem to fix it.
I mean, that is not what they are writing buddy.
10 years from now: “what’s a context window?”
10 years from now: “come with me if you want to live”
Terminator 2 Clip: https://youtu.be/XTzTkRU6mRY?t=72&si=dmfLNDqpDZosSP4M
I am kind of already at that point. For all the complaining about context windows being stuffed with MCPs, I am curious what they are up to and how many MCPs they have that this is a problem.
“640K ought to be enough for anybody”
I dunno why you're getting down voted. This is funny.
"That was back when models were so slow and weighty they had to use cloud based versions. Now the same LLM power is available in my microwave"
One of the MCP Core Maintainers here, so take this with a boulder of salt if you're skeptical of my biases.
The debate around "MCP vs. CLI" is somewhat pointless to me personally. Use whatever gets the job done. MCP is much more than just tool calling - it also happens to provide a set of consistent rails for an agent to follow. Besides, we as developers often forget that the things we build are also consumed by non-technical folks - I have no desire to teach my parents to install random CLIs to get things done instead of plugging a URI to a hosted MCP server with a well-defined impact radius. The entire security posture of "Install this CLI with access to everything on your box" terrifies me.
The context window argument is also an agent harness challenge more than anything else - modern MCP clients do smart tool search that obviates the entire "I am sending the full list of tools back and forth" mode of operation. At this point it's just a trope that is repeated from blog post to blog post. This blog post too alludes to this and talks about the need for infrastructure to make it work, but it just isn't the case. It's a pattern that's being adopted broadly as we speak.
> modern MCP clients do smart tool search that obviates the entire "I am sending the full list of tools back and forth" mode of operation
How, "Dynamic Tool Discovery"? Has this been codified anywhere? I've only see somewhat hacky implementations of this idea
https://github.com/modelcontextprotocol/modelcontextprotocol...
Or are you talking about the pressure being on the client/harnesses as in,
https://platform.claude.com/docs/en/agents-and-tools/tool-us...
More of the latter than the former. The protocol itself is constrained to a set of well-defined primitives, but clients can do a bunch of pre-processing before invoking any of them.
CLIs are great for some applications! But 'progressive disclosure' means more mistakes to be corrected and more round trips to the model - every time[1] you use the tool in a new thread. You're trading latency for lower cost/more free context. That might be great! But it might not be, and the opposite trade (more money/less context for lower latency) makes a lot of sense for some applications. esp. if the 'more money' part can be amortized over lots of users by keeping the tool definitions block cached.
[1]: one might say 'of course you can just add details about the CLI to the prompt' ... which reinvents MCP in an ad hoc underspecified non-portable mode in your prompt.
This is a fair trade-off and the post should probably be more explicit about it. You're right that progressive disclosure trades latency for cost and context space. For some workloads that's the wrong trade.
The amortization point is interesting too. If you're running a support agent that calls the same 5 tools thousands of times a day, paying the schema cost once and caching it makes total sense. The post covers this in the "tightly scoped, high-frequency tools" section but your framing of it as a caching problem is cleaner.
On the footnote: guilty as charged, partially. The ~80 token prompt is a minimal bootstrap, not a full schema. It tells the agent how to discover, not what to call. But yeah, the moment you start expanding that prompt with specific flags and patterns, you're drifting toward a hand-rolled tool definition. The difference is where you stop. 80 tokens of "here's how to explore" is different from 10,000 tokens of "here's everything you might ever need." But the line between the two is blurrier than the post implies. Fair point.
I ran into this exact problem building a MCP server. 85 tools in experimental mode, ~17k tokens just for the tool manifest before any work starts.
The fix I (well Codex actually) landed on was toolset tiers (minimal/authoring/experimental) controlled by env var, plus phase-gating, now tools are registered but ~80% are "not connected" until you call _connect. The effective listed surface stays pretty small.
Lazy loading basically, not a new concept for people here.
The industry is talking in circles here. All you need is "composability".
UNIX solved this with files and pipes for data, and processes for compute.
AI agents are solving this this with sub-agents for data, and "code execution" for compute.
The UNIX approach is both technically correct and elegant, and what I strongly favor too.
The agent + MCP approach is getting there. But not every harness has sub-agents, or their invocation is non-deterministic, which is where "MCP context bloat" happens.
Source: building an small business agent at https://housecat.com/.
We do have APIs wrapped in MCP. But we only give the agent BASH, an CLI wrapper for the MCPs, and the ability to write code, and works great.
"It's a UNIX system! I know this!"
While I generally prefer CLI over MCP locally, this is bad outdated information.
The major harnesses like Claude Code + Codex have had tool search for months now.
Can you explain how to take advantage. Is there any specific info from anthropic with regards to context window size and not having to care about MCP?
Fair point on tool search. Claude Code and Codex do have it.
But tool search is solving the symptom, not the cause. You still pay the per-tool token cost for every tool the search returns. And you've added a search step (with its own latency and token cost) before every tool call.
With a CLI, the agent runs `--help` and gets 50-200 tokens of exactly what it needs. No search index, no ranking, no middleware. The binary is the registry.
Tool search makes MCP workable. CLIs make the search unnecessary.
Let me guess the command: [error]
Wait, better check help. is it -h? [error]
Nope? Lemme try —-help. [error]
Nope.
How about just “help” [error]
Let me search the web [tons of context and tool calls]
https://github.com/RhysSullivan/executor
There's already an open source tool that does exactly the same thing: https://github.com/knowsuchagency/mcp2cli
Great tool, however we went to a dedicated CLI client (think gh, aws, stripe) in Go.
Getting LLMs to reliably trigger CLI functions is quite hard in my experience though especially if it’s a custom tool
The trend is obviously towards larger and larger context windows. We moved from 200K to 1M tokens being standard just this year.
This might be a complete non issue in 6 months.
Those bigger windows come with lovely surcharges on compute, latency, and prompt complexity, so "just wait for more tokens" is a nice fantasy that melts the moment someone has to pay the bill. If your use case is tiny or your budget is infinite, fine, but for everyone else the "make the window bigger" crowd sounds like they're budgeting by credit card. Quality still falls off near the edge.
Context windows getting bigger doesn't make the economics go away. Tokens still cost money. 50K tokens of schemas at 1M context is the same dollar cost as 50K tokens at 200K context, you just have more room left over.
The pattern with every resource expansion is the same: usage scales to fill it. Bigger windows mean more integrations connected, not leaner ones. Progressive disclosure is cheaper at any window size.
Context caching deals with a lot of the cost argument here.
It helps with cost, agreed. But caching doesn't fix the other two problems.
1) Models get worse at reasoning as context fills up, cached or not. right? 2) Usage expansion problem still holds. Cheaper context means teams connect more services, not fewer. You cache 50K tokens of schemas today, then it's 200K tomorrow because you can "afford" it now. The bloat scales with the budget...
Caching makes MCP more viable. It doesn't make loading 43 tool definitions for a task that uses two of them a good architecture.
> Not a protocol error, not a bad tool call. The connection never completed.
Very interesting topic, but this LLM structure is instant anthema I just have to stop reading once I smell it.
We built a unified API with a large surface area and ran into a problem when building our MCP server: tool definitions alone burned 50,000+ tokens before the agent touched a single user message.
The fix that worked for us was giving agents a CLI instead. ~80 tokens in the system prompt, progressive discovery through --help, and permission enforcement baked into the binary rather than prompts.
The post covers the benchmarks (Scalekit's 75-run comparison showed 4-32x token overhead for MCP vs CLI), the architecture, and an honest section on where CLIs fall short (streaming, delegated auth, distribution).
With context windows starting to get much larger (see the recent 1M context size for Claude models), I think this will be a non-issue very soon.
The thing with CLIs is that you also need to return results efficiently. It if both MCP and CLI return results efficiently, CLI wins
Let me guess - another article about how CLI s are superior to MCP?
Tired of this shit. Be less stupid.
At this point, I feel like MCP servers are just not feasible at the current level of context windows and LLMs. Good idea, but we're way too early.