With Agent Client Protocol (ACP) you can keep the same UI and switch not models, but entire agents, that means using tools/prompts/compaction/etc that are tailored for the model.
Linked your own project with an "All rights reserved" license? The only thing my company will allow me to do with that software is have AI steal it </s>
We don't have any client-side telemetry. Conversations with Poolside models are stored, but you can use any ACP agent with pool. And we have plans to open-source it eventually.
I just did a build in Nemesis8 (containerized agents) and Pi appears to be working fine. Opencode is a good choice too if you're interested in checking out GLM 5.2 from z.ai.
I haven't tried opencode, but when I opened pi I was able to complain about that silly and stupid left-padding that LLM TUIs have started using that prevents basic copy-paste operation, and pi was able to edit itself to fix it.
They are different models. OpenCode is trying to be a claude code/codex replacement, where-as pi is something you build yourself, kind of trying to be an emacs type thing compared to vs-code. As in emacs it is more common to write your own extensions, where as in vs-code most people just download them.
I keep butting into the question of; why opencode, when you've got codex available? Codex is open source as well, and i can't seem to picture a situation where one would want Opencode over Codex.
As far as I can tell, they tick the same boxes- but one has the support of a big boy model provider.
Well, the reason is simple: over the past several months, it has become very difficult to use Codex with non-OpenAI models. They removed the old edit tool that didn't require OpenAI's free form tool calling (that no other LLM host supports), they are adding tools to every request of a type that break most LLM hosts unless you use a proxy to filter them out, they add a "developer" role to some messages which breaks some chat templates, etc.
If someone wanted to fork Codex and make a community-maintained version that supports third party models, that would be great, because I liked Codex better than OpenCode for the most part.
Maybe you've found workarounds. Maybe you're using an old version of Codex. Maybe you have your own soft fork. I don't know. But I used to be able to use Codex with self-hosted models, and I gave up on that about a month ago as they kept breaking that.
Ah, I wasn't aware things regressed there. Yea certainly workarounds n soft fork sorts of things definitely would work- but thats a bummer than things have changed.
From watching Pr's and issues- seems like openai at least wants to come across as if theyre supporting non-oai models :/
You would think they would support their own GPT-OSS model, but, not really anymore. I wish they would release a GPT-OSS 2, but this doesn't fill me with confidence.
If you care about privacy at all, you can route your Opencode requests through an inference provider that does not retain any logs or data. It is also much cheaper. So if your boxes include `Privacy` and `Affordability`, then no, they don't tick the same boxes.
oh-my-pi is a bit of a cross between the two; comes with basically everything OpenCode does, but still easy to customise.
OpenCode is nice if you don't want to do a lot of research and just want to get started right away. The OpenCode Go plan for $5 a month for your first month is a great way to do this, with good models to choose from and reasonable usage limits for a beginner.
I use Go plan precisely with Opencode IDE (and also Jetbrains IDE suite), but now also have access Gemini Pro and Claude Pro. And wonder which tooling to invest my time into, especially that MCP servers also potentially come into play here, and I want at least some models/tools to handle private tasks, like handling my increasingly-complex Home Assistant setup. And I also want to start using models according to needs (plan, execution, reviews). This shit gets extremely complicated extremely quickly, not to mention how often this field shifts direction.
I use “all of them”. My primary harness is oh-my-pi. I probably use 10 different models on a regular basis.
I occasionally use OpenCode.
I try to use Codex and Antigravity as much as I can, often using it as a secondary agent (due to different usage pricing models than API). The same skills and MCPs work across harnesses.
Edit: I don’t use Claude Code simply because I already have enough to deal with and don’t see a major advantage to their harness. I use Opus credits from my Google subscription on the rare occasion I need them.
Cursor is also worth checking out particularly at the $20 a month price tier. If you have Grok you effectively already have it too.
I expect to have a completely different answer a year from now. The main “lift” we’ve gotten from AI tools is our clients now get an Android + iOS app + macOS app + Electron + PWA to go with whatever web based app they want us to build, at essentially the same original price. (There’s also a CLI and a TUI, but so far none of them care about that…)
We just made the decision to start adding MCPs to apps. Gonna be an interesting conversation in a few weeks when I can tell my business contact he can use his favourite chatbot to now plug in directly to the custom app he bought from me.
I am genuinely curious what it tells you, as "curl https//.. | sh" has long been an enormously popular approach to distribution in the open source world. Homebrew, to name just one example, advertises a similar method.
(pi.sh also documents other install methods, like `npm`, on their homepage)
If trust and security is the issue, unfortunately "better" ideas like hashpipe [1] never achieved critical mass
I really hate the `curl <url> | sh` specifically because if your connection drops at a specifically unlucky point in time you are left with a partially executed script which if you are unlucky enough may just have been executing `rm -r ~/.cache/<pkg>/download` but it stopped at `rm-r ~/`.
Is it likely? No. Can it happen? Yea.
Just make it `curl -o <file> <url> && sh <file>` and this entire problem is gone.
Correct, and/or in addition, most nowadays prepend something like `set -euo pipefail` to the scripts in the line immediately after the shebang which results in stopping on errors, including things such as syntax errors stemming from e.g. incomplete installer transmission over wire.
(At least for bash scripts, I’m not sure whether these are POSIX syntax to be frank.)
Package managers: ecosystem is fragmented, requiring a long list of distro- and package-manager-specific instructions. Many scripts already install through package managers, they simply make the user’s life easier.
Flatpaks: These are clearly designed for desktop applications, with CLIs treated as an afterthought. They may be the best long-term hope, but today they are definitely not as convenient or widely available as a simple script.
If you care about adoption, `curl | sh` is the only real option today, which is why virtually all project show it as the first option.
The "like an adult" is what has and will continue to hold back linux on the desktop. Always gatekeeping less technical users instead of acknowledging adoption and ease of use are critical.
i dunno, nothing about most computing is particularly easy to use or intuitive.
what has worked over time is having computers of various types in schools, where teachers teach students and let them play with it.
nobody teaches about the command line, so nobody knows what to do with it. its also inscrutible without a useable help view, unless you already know how to use the terminal
Windows, macOS, iOS, and Android are definitely much easier to use and more intuitive than Linux today. That’s because their developers are incentivized to put themselves in the shoes of less-skilled users and figure out how to build a good experience for them.
I’m all for higher Linux adoption on desktop, but there’s still a lot of resistance to making less-skilled users the primary target instead of power users.
Teaching can help, but if it takes 50 hours to learn the basics of Linux versus 5 hours for Windows, it’s a losing battle.
A lot of those scripts are wrappers around package managers. Creating them is extra work for distributors, but they still do it because package-manager installs are not truly one-liners and offer far less control over the installation experience.
Users need to figure out which of the 10+ package managers they should be using, then run several commands. If something fails, the error messages are often cryptic and not easily configurable by the distributor.
And that’s before getting into the many rough edges of package managers. Most of them flat-out refuse to handle configuration and leave that part to the end user. Now you also need to document how to edit YAML and restart a systemd service. With an install script this is also solved.
For power users, this always looks trivial. In practice it raises the barrier to entry and can meaningfully affect adoption if your product is often used by less technical people.
In what world does a user have to choose between 10 package managers? Each distro has exactly one. There are also only about three, maybe four main package managers out there.
A shell script being piped into bash has so many more ways to break than a package. And if yhe theory is that package managers are fickle (they aren't), then how does adding more complexity help?
It is much simpler, much safer, and easier to maintain a package than an install.sh, eapecially for a big project.
Configuration can be handled by a script, yes. Here's a crazy idea: Your package can include scripts for configuring the software. It's almost as if most packages do. The scripts/utilities could even restart a systemd service for you.
Unless you're talking about configuring your build, in which case we're dealing with an experienced developer who will have no trouble just cloning the repo and building from source.
My biggest issue is: if we're dealing with someone who can't use a package manager, we're dealing with someone who doesn't have the capacity to judge how safe a script downloaded off the internet is. This does not drive linux adoption, it drives botnet adoption.
It's crazy to me that even after seeing so many major software distributors choose `curl | sh` as their entry point, people like you will still argue to the ends of the earth that there’s no problem with the package manager ecosystem.
I'll stop there. I'm not interested in continuing this discussion when it's being conducted in bad faith.
Most official repositories have policies that are incompatible with the needs of software vendors (release timing, supported versions, bundled dependencies, etc...).
IMO a lot of the blame falls onto the package manager ecosystem refusing to take into account very valid needs and claiming they aren't real / desirable.
The ideas aren't mutually exclusive, and I've never seen an open source project support "curl | sh" without also supporting those methods.
Indeed, plenty of these scripts often act as a "what OS and packager do we have" mux. Just look at the source of this one, for example.
When you support an open source project at scale and/or with less savvy users, you come to see the benefit of "here, just f'ing slam this into your shell and we'll figure it out" installers. I know I have.
> I am genuinely curious what it tells you, as "curl https//.. | sh" has long been an enormously popular approach to distribution in the open source world.
It's plain horrible. You could have, for example, a compromised server serving malware but only one out of every 100 download. The only signature you rely on is TLS.
Proper package distribution are using proper signatures schemes, are decentralized, even for some offer reproducible builds (meaning you can rebuild the whole package yourself and verify your build matches), etc.
Hashpipe is an attempt at reproducing some of those guarantees. Not unlike container pining using hashes. It at least fixes the "Jack and John installed this already and I know I'm getting the same version as they did".
Proper software distribution is signed, reproducible and ideally also uses some proof-of-existence for the hashes.
My bet is this: in the face of the countless supply chain attacks, we'll see more and more people getting very serious about security, including the security of software distribution. And curl bash'ing won't be part of it.
There is no threat model that doesn't also apply to pretty much every other distribution method.
It's just people who have internalized "don't paste commands from the Internet into your terminal" and aren't thinking about exactly what makes pasting commands from the Internet into your terminal dangerous, and how that applies to this specific case.
it tells you they're just like basically every other CLI targeting project for the last 15 years? I mean is it a big security hole we all accept, yes, it is. But it's not really indicative of much. That's also how I install rust.
Further - what the flicking fuck do you think an installer is going to do on your system? Not run any commands? Because I've written installers for every platform... they ALL can run commands.
So what exactly is the complaint in this comment? If you want to go read the install script - knock yourself out (or hell, point your agent at it...).
Understand that 99% are comfortable trusting downloads. They know that it's just as easy to sneak backdoors into source code as it is to sneak backdoors into executables.
99% of developers are most definitely not comfortable piping a script into the shell.
I would never runa script without reviewing it. I would install a package from a distros repository without reviewing the contents, however, because I can trust that a distro maintainer has reviewed it, that anyone else in the community can review it, and that that the bytes I'm downloading are the specific bytes I'm supposed to be downloading.
If you run a script off the open internet, you're being massively irresponsible. There are so many attack vectors that could be used here, and they are much easier to implement than something like the massive social engineering attack that was XZ.
I have been developing software since the late 80s, mostly CAM software for metal cutting machines, and I have been refereeing tabletop roleplaying games like Dungeons & Dragons since the late 70s.
I get the power of LLMs, and I do find them useful. But I find them useful in much the same way I find a really good set of random tables useful, or a good set of rules for procedurally generating something like a star sector for a science fiction campaign.
For my day job developing software, and for the RPG campaigns and books I run and publish today, LLMs are, in many cases, random tables on steroids. After using them for two years, even with all their improvements, I am continually reminded by the results I get that, at the heart of it, I am still dealing with what amounts to randomly generated content.
Yes, I know it is more accurate to call the process probabilistic rather than random. And yes, somebody can construct a technically deterministic setup with fixed weights, fixed seeds, fixed sampling parameters, and a frozen runtime environment. But that is like saying you can recreate a rainstorm if you get a thousand butterflies to flap their wings in exactly the right way. It may be technically true, but it is not how the technology behaves in normal day-to-day use.
For practical purposes, given the same prompt and the same apparent starting conditions, the result can differ each time you use a model. The outputs will often be highly correlated, and often useful, but they are not deterministic software in the ordinary sense.
So far, I am failing to see how the inherent probabilistic nature of the technology can be fully overcome. I understand how we got to where we are today from older neural net technology, including the systems used for vision and sound. What we have now can be very useful. But my view is that it is being badly oversold and overhyped. Its probabilistic nature is being vastly underestimated, and that is a major reason for much of the weirdness and many of the failures we keep seeing.
In tabletop roleplaying, there have been times when hobbyists relied too much on procedurally generated content and ultimately got burned by it, either through campaigns that were not as fun or products that were subpar. Each time, the lesson was the same: there is no substitute for human judgment.
Any workflow or technology incorporating LLMs has to keep humans in the loop, and not merely as rubber stamps. The human has to remain the primary decision maker.
> So far, I am failing to see how the inherent probabilistic nature of the technology can be fully overcome
I deeply hope we never reach the point where that’s overcome. What we’ve seen over the past few years is how AI will destroy humanness from pretty much the entire digital realm. It’s by far the most evil, anti-human technology ever created, corrupting everything it touches. The last thing we need is for it to become reliable
The trouble is: there is no deterministic algorithm that can do the things neural networks can do.
For many of these problems, I think it is likely that no deterministic algorithm can exist because the problems are fundamentally underspecified. E.g. a common task in computer vision is generating a 3D depth map from a 2D image. This is inverting a lossy projection, so any solution must be a least partially a hallucination.
I think we just have to accept this. It's a different type of algorithm, built out of statistics instead of logic, with different strengths and weaknesses compared to traditional software.
I feel the same way. Analogy: we’ve been geologists this whole time, building our dynamic and interesting mechanical planet.
Now, biology exists. It’s wet and messy and impossible to understand (we haven’t invented the microscope yet). That doesn’t mean biological study is not worth doing.
> I am failing to see how the inherent probabilistic nature of the technology can be fully overcome.
This is common in image generation pipelines because if you find an image you really like, you can store the seed and then reproduce it with small tweaks, otherwise - to quote Borges - “Look at it well. You will never see it again" User-facing deterministic pipelines do exist for generative AI.
I know you make this argument in your post, but that's really the answer if you want repeatable results. For a classifier or a detector, determinism is a requirement, but for an LLM non-determinism desirable property because it feels like a more natural conversation. The downside is it's extremely difficult to replicate a response without pointing the model to an earlier conversation.
And specifically for the RPG case, don't you want non-determinism? You don't want the model spinning up the same identical person if you say "Generate me an NPC character sheet for an innkeeper". This was a complaint that people had in the past, that models would regurgitate the same scenarios or the same jokes.
Where I suspect DMs run into trouble is not randomness, but lack of self-consistency in worldbuilding. Say you generate an NPC and then refer back to them later and the model gets some details wrong. You could compare to a system like Dwarf Fortress where everything down to the genealogy and faction relationships are rigidly generated.
Setting aside that we're living in a universe that's full of (practically) deterministic processes built over probabilistic components (and which behave sufficiently reliably without any human in the loop), I think the specific failure mode you're citing is that there aren't enough gates and constraints applied to the processes you've seen.
LLMs can contribute quite reliably given very narrow prompts and short horizons (keeping turns low and context brief). If you chain a bunch of these narrow contributions together and define guardrails (structured outputs, online evals, other-llm-as-judge/jury, etc...) you can produce a very repeatable workflow that reliably delivers to defined service levels.
The obvious issue being - you've got to define the workflow and implement all the guardrails, not hope that the LLM will infer them during a session or a one-shot prompt.
That doesn’t disqualify humans. It highlights the difference I am talking about.
Those chemical interactions and quantum effects lead to emergent properties like judgment, experience, context, accountability, and an understanding of consequences. Those are not properties that LLMs possess, regardless of how useful their output can be.
That is not to say that, in the future, LLMs won’t be used as part of other systems that add some of those properties. But that is not what we have today, or what can be seen in the foreseeable near future.
> Those are not properties that LLMs possess, regardless of how useful their output can be.
What makes you say that? Other than the usual "I'm a human, and humans must be very special, so when something that's not a human does X, it's either not real X, or X wasn't important in the first place".
It highlights, in my eyes, that "critical flaws" of LLMs are the same exact flaws that humans routinely suffer from. Sometimes LLMs have it worse, but sometimes they have it better too.
LLMs do improve release to release though. Humans are more of a mixed bag.
Actual 90d uptime: 97.6838% (calculated by Codex from live data)
Computed from the page’s own data for 2026-03-26 through 2026-06-23:
- Partial outage: 43h 15m 1s
- Major outage: 6h 46m 48s
- Total affected time: 50h 1m 49s
- Major-only uptime: 99.6861%
Thanks, never thought about that. Definitely makes sense for situations where you don't have 24/7 requirement for a service.
These stupid SLAs in the SaaS contracts should be reframed like this: 99.9% during working hours, not 99.9% overall. Would also give the SaaS vendor leeway to only guarantee 90% availability outside working hours, and then do their maintenance tasks in those windows.
It started failing two days ago, when it suddenly couldn't access gmail threads reliably. Then it started popping up warnings that I was over quota when I wasn't. It even let me use Fable briefly, or pretended to. Meanwhile search finally started working, so there's that.
Out of desperation, I moved to ChatGPT and it's working better than I remember. All these companies are playing games under load, under failure. No wonder we can't agree on what's good for what.
I signed up for paid plan on Claude just 3 hours ago for the first time and was scratching my head on how that thing gets praised so much if I can't even send a question half of the time....
Yeah it's one of those situations in which you reluctantly check for downtime as a last resort, only to find out you indeed just had a bad luck. Which is good, because I thought it was a beginner's brain type of friction.
If the government wanted people to take holidays off they could just legislate that people can't work on those days. I doubt there is any political will to do this, though.
Perhaps I am taking this idea a bit too seriously but I imagine that it might not work because of VPN's.
but VPN's can be detected and perhaps already are by these AI companies and it can lead to the ban/restriction of an account so not many people would prefer to use VPN.
Then, theoretically speaking, I suppose that it might be possible to perhaps toggle off these AI companies for enterprises or licenses of dev's
Though I imagine that it would mean taking an ID and having a special dev tag so as to not remove the general purpose chat bots that these sites still operate.
I do imagine that it might be really interesting to have a single day where AI esp closed source is/are turned off and see how that pans out but looks like till then claude is sprinkling its downtime throughout any part of the day/month randomly with their downtimes.
I have two sessions going. One is fine, one keeps timing out. Both Opus 4.8 in Claude code in terminal. Must have them routed to different to different infra that isn’t equally impacted.
I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.
— Boris Cherny, head of Claude Code
Reliability is a direct reflection of the quality of the underlying infrastructural code. If even Anthropic, the company with the world's best agentic vibecoders, has horribly unreliable infrastructure, it really says something about the quality of the world's best agentically produced code.
Is there any indication these errors are related to Anthropic-written code as opposed to operational issues from the fastest-growing infra buildout ever?
Layer-wise, the app is pretty far removed from request routing to GPU pools.
This is almost certainly a software issue, though. Even if it's due to scaling, they still built a system that failed catastrophically rather than degrading gracefully.
Sure. But could it be k8s config? Could it be Nvidia Bright Cluster? Could it be load balancing?
I'm not saying Anthropic isn't to blame for a system that is literally approaching one-nine uptime; they certainly are. I am saying that jumping to the "it must be vibe coding's fault" is an emotional confirmation-bias belief, not an evidence-based belief.
I'd expect that they're also managing their k8s config and other infra using LLMs (it's actually quite good at this, at least for my simple homelab use-cases).
Right. If this were truly a pure scaling issue, I’d expect the interface would offer an archive.is-esque “Claude is at capacity; your prompt is #XXX/YYY in the queue; estimated time remaining: ZZZ seconds”
Instead, the whole system just shits the bed, catastrophically.
But such messages would suggest that Claude has engineered limits, which isn't what the market wants to hear. Completely falling over and being unavailable is just another Tuesday on the internet, will be forgotten by the weekend.
> being unavailable is just another Tuesday on the internet, will be forgotten by the weekend.
This is true when you have like one failure a year, but Anthropic is starting to look a lot like github lately when it comes to uptime.
After a certain point the reputation for unreliability starts sticking to you, especially when you position yourself as an indispensable tool for completing work people need done.
I'm not sure if that's really an Anthropic problem you're pointing to vs a problem that their infra layer handles (Amazon, Google, whatever hyperscaler). i.e, they might be scaling quickly but they are running on top of established infrastructure.
I would bet that they have inference setup for internal use on a separate system from the customer-facing production environment. The same way telemetry infrastructure needs to be run separate from normal production systems, so you aren't "blind" when you need it most.
He is a salesman at this point and is not talking to you. He is talking to the investors who want to vibe code loops to waste tokens on building slop to get rid of you.
Goes to show how fake this industry has become when VC dollars have flooded it.
Somehow it is fine to vibe code infrastructure or security because someone (with a clear vested interest) wants you to spend more tokens at their casino because that is how they "win" at the casino (which they work at).
Except in reality, this part of software is critical and irresponsible to
'write loops" and we all know that he doesn't believe what he is saying.
It's very very clear they're eating their own dog food, in a product space built on tech that didn't really exist publicly 5 years ago, to the success of billions, that people increasingly depend on. Maybe I'm an optimist, but I can't fathom the intense negativity or perspective of failure here.
Don't use it. Maybe wait a few more years. If it's not valuable/useful, then not using it, while everything matures, will not be a problem.
> If even Anthropic, the company with the world's best agentic vibecoders...
But that's really not what they have. They have AI experts who are creating incredible LLMs.
Everything else is more than meh: Claude Code is really bad. Such a turd would never have gained any traction if it wasn't for the LLMs behind it.
I use LLMs to code daily (Claude Code still, mind you, for I didn't take the time to switch yet) and these modesl are both amazing and pathetic.
If you don't verify everything they output, they do the absolute craziest thing imaginable.
One example is I got an Anthropic model notice a "pattern" in range bound integer values. I had them range bound between, e.g., 0xCAFE0000 and 0xCAFEFFFF. And at some point a comparison/validation was needed and instead of doing an integer comparison the Anthropic model went ballistic: instead of doing an integer comparison it converted the numbers to a string, then started doing substring matching on "0xCAFE" and went even more "expert" by verifying at which position the match was happening. All that while explaining why it couldn't possibly fail.
Why did it do that? Very likely because, in a comment, it saw "0xCAFE..." as a string. And the thing saw a pattern.
Can you believe it? There's a pattern. So it must light up connections. We've got a pattern!
Now amount of kludge, hidden pre-processing, hidden post-processing is fixing the "quality" of the code produced by something that, instead of doing an integer comparison, converts things to string and then does substring searches and indexes computation.
There's no fixing that.
Yesterday: had to use three guard clauses before pushing data... Two of the three "logic gates" (as the model would explain they were, which is kinda right) he got right. The third one: same thing... It was planning to go ballistic, introduce countless lines of code, insane abstractions, to make a test that was solved with a one line timestamp comparison.
It's because it does things like that that the people who explain that they don't code anymore are delusional if they think this gives, as of today, quality code.
It's like that other dude who was happy to produce 37 K LOC per day and counting.
> ... it really says something about the quality of the world's best agentically produced code
Oh it is totally shit code. But if you monitor everything and vet everything they do, it's helpful.
I find these LLMs way more helpful at finding the source of bugs (not fixing them: finding them, which is 90% of the job anyway) and at acting like rubber-ducks then at writing code.
Claude Code sucks. Claude Code CLI sucks. Their only "solutions" to all problems is to create VMs, headless browsers, and resort to incredible hacks (the infamous "game loop" that modifies the characters output by the LLM is just shameful) etc. to try to hide the misery. It's miserable kludges everywhere.
And the only reason these miserable kludges are not entirely falling apart is because they rest on the shoulders of actual giants: projects like Linux, QEMU, etc. that were not vibe-coded.
It's sad to have useful tools (the models) and to make such poor use of them.
I'm pretty sure that, in the end, it's just like open-source powering the entire world by now: we'll have open-source projects like Pi and then newer ones that are going to come out and fix the mess we have now. And they're not going to be 100% vibe-coded by people whose jobs is "to write loops".
Meh, this is the "must be the veganism" fallacy: if someone knows you're vegan, then any ailment you might have, no matter how ubiquitous in the population, must be somehow due to your vegan diet and no more details are required.
Except now it's the "AI did it" fallacy where if you know a company uses AI, even infra scaling issues must be due to AI, and if you had just used less or no AI, you would have been spared even though that has never been true.
The usual response to this goes something like "well they made claims that AI is good" therefore anything short of perfection supposedly debunks the claim.
This is literally they saying they are letting their LLM run wild(ish) and seeing the status.claude.com we can see the result.
This is a case where the outcome is the direct result of the engineering practices like the ones they describe.
PS: Yes I use Claude, Coded, Amp and Cursor agents every day so I am not saying here LLMs are not valuable.
LE: They did not made claims that "AI is good" they made claims that developers/computer engineers are not needed anymore in the near future. Thats is a stronger claim and has a direct relation with a product they have which needs computer engineering (yes infra counts too) and which seems to be down more than we expect as a good quality bar.
You just said "it's not the 'must be veganism' thing, it's the 'must be veganism thing'"
Unless you have inside knowledge of their infra ops and management tools, it is just guessing and blaming veganism. For all we know it could be tools from Nvidia or anyone else failing under massive load.
It could be the veganism. Some things are. Leaping to it as the only possible explanation for every ailment is exactly the fallacy.
No. We dont need metaphors like that with veganism (which touches ideologies also) when talking about engineering and a company that promotes out loud that engineering is done.
I have not stated anything. I just replied to a metaphor which is not needed cause here we talk about engineering problems handled by engineers in a tech company. I give you something else where this line of thought could be wrong: culture beats (and destroys) engineering practices unless regulated by law.
In this case yes this is not because of LLMs but because of company culture.
Still hard to know where the line draws because Anthropic talks about solving computer science for good as in humans need not apply.
If you're really committed to the "no difference between datacenter hardware engineering and claude code harness engineering, they must all use the same practices, anything true for one is true for the other" bit, fair enough. It seems fairly ideological to me.
It feels to me that you are really trying hard to brainstorm root causes for their failures.
Could be a hardware issue, could be a datacenter issue. Could be anything. So it could also be a software issue right?
This is not ideology. This is talking about root causes and I replied to someone that started saying this is not because of them promoting and using LLMs to the maximum. Could be it is not because of that. But it could be because of LLMs.
Keeping a company accountable when they try to sell a service that will replace engineers is not ideology. Ideology will be to not use any company that uses LLMs. But pointing out the disconnect between the public discourse and the status.claude.com is a simple idea.
Can you tell me that all those red lines there are infra?
But it is like that. You have zero insight into the infrastructure issue. And the person quoted above is a Claude Code developer. So because this guy uses Claude generously to build Claude Code, then Anthropic's API scaling issues must necessarily be caused by his agent loops even though scaling issues plague every tech company, no less often pre-AI.
The issue is that it's a thought-terminating cliche, and it would be nice to have one place on the internet that isn't just who can post one the fastest with the most glee to the giddy seal-clapping of the audience.
Engineering practices or best practices are much more than writing code.
So not sure what we are debating here: I see first hand companies jumping full on using LLM for _everything_ for the last 6 months (of course Anthropic longer) and without guardrails and good engineering practices the number of incidents, downtime is increasing.
Look at status.claude.com - Anthropic could at any point come out and say all those are due to third party providers.
I am also not saying here Anthropic is worse than other scaleups. But they do something different: they come in front of us and tell us they have better engineering practices.
> Anthropic could at any point come out and say all those are due to third party providers.
Why can't it be simply the case that Anthropic is struggling by their own accord? Infra scaling isn't a solved problem, much less with new, complicated, ever-changing, stateful LLM requests.
Pretty much every API-service-centric company I've worked at was in some constant state of either triaging or thinking about infrastructure health, often due to the familiar cascading problems of a necessarily distributed system.
But now with the AI scapegoat, we rewrite history to pretend us humans solved infra scaling, so any issues today must be caused by AI and any related superstitions we want to tack on.
> Why can't it be simply the case that Anthropic is struggling by their own accord?
They can and it is normal. I have said it is normal for scaleups specifically at a similarity growth rate.
What we (or at least I) critique here is coming out in the world and announcing that coding is done while having a product that has a status page full with red stripes. Yes, could be infrastructure, could be third party integrations could be a lot. But a lot of what is there is software. And yes, some parts is hardware. Unless the root cause is culture. In that case as I mentioned in another comment there I give them that: LLMs cannot solve culture.
Again the difference here is that the other scale-ups with similar _scaling_ issues are not talking about how we should all just use LLMs for everything and that learning to code is not required anymore.
So I am not saying the real issue is not infra or integration with third parties. What I am pointing at is: "don't talk that you don't need engineering while you - yourself - have engineering problems that need engineering solutions and still have not solve them".
Also you are getting out of your way to brainstorm possible root causes that will let them get away with this cognitive dissonance (or is there a better them in communication). Let them do the explanation and defend their position as they are the ones attacking the computer science engineering.
> Again the difference here is that the other scale-ups with similar _scaling_ issues are not talking about how we should all just use LLMs for everything and that learning to code is not required anymore.
> Let them do the explanation and defend their position as they are the ones attacking the computer science engineering.
This once again boils back down into: because they make claims about LLMs being good, I get to make any claim I want, and if if they didn't want me to make my claim, they shouldn't have made theirs.
It seems reactionary rather than earnest.
You've accused me and someone else of "brainstorming" reasons why they might have infra scaling issues, but I'm not. I'm pointing out that everyone has them especially pre-AI, and all of those reasons are on the table, not less likely. You have done the opposite: committed to a suspicion. That is the end result of the thought-terminating cliche.
Another data point: GitHub is extremely insistent its employees maximally use AI for internal development [0], and we’ve concomitantly seen its reliability fall off a cliff in the last year or so.
>GitHub is extremely insistent its employees maximally use AI for internal development
Or it could be that GitHub saw a 14x increase in commit volume last year[0], and we've concomitantly seen its reliability fall of a cliff in the last year or so. Given that Microsoft is leasing additional space on AWS(!)[1] to handle the additional commit volume, my personal money is on commit volume growth being a bigger issue than internal use of AI.
Internal use of AI may have been an issue. Commit volume growth may have been an issue. Unless one has direct knowledge of their infrastructure issues, claiming to know is quite literally making exactly the "they are vegan, their illness must be caused by their veganism" argument the GP commenter was talking about.
There’s a difference between having normal levels of difficulty and bad luck, and having people blame those on the wrong thing, vs having extraordinarily miserable quality and having people find the obvious difference. Potentially yes, they might have terrible wiring in their office or a crippling fondness for vim. But if I were their PR department I’d be talking about that if it was the problem.
If you go around bragging that you use AI for everything as part of your marketing plan, then don't be surprised that people blame you heavy AI usage when you have a problem.
"..people who followed a vegan diet had noticeably low levels of iodine in their bodies, an element that is essential for growth, bones, and brain function. In addition, vegans had lower bone health scores..." - https://www.bfr.bund.de/en/press-release/vegan-vegetarian-be...
There are a lot of nutritional blind spots in vegan diets.
It is a diet that requires exceptional planning and intentionality to be at a baseline of health similar to a balanced omnivorous diet.
So indeed, the "it must be veganism" is not an unfounded concern when health complications arise, in a very similar way to "it must be the AI" is a valid concern when software issues arise.
This isn't really the place for this, nor does it matter to my analogy.
But I was more getting at, say, staying out of the sun or being skinnyfat as a vegan, and suddenly you look "sickly"/"frail" when you'd be given the grace of looking like most people otherwise.
A similar analogy would be someone saying "well, of course you do" if you have any malady while having been vaccinated. My point being to bring up the thought terminating cliche of it compared to doing the necessary further analysis to link the malady with the suspected cause.
---
> "Vegans and vegetarians may have higher stroke risk"
It was a lump vegetarian + vegan group with a weak CI bounded at 1.02 for 3/1000 cases over a decade. The same group also had a more robust benefit of less heart disease than meat eaters. The stroke outcomes aren't replicated in other cohorts either, afaik. But the heart disease benefits are.
> "Vegans had a 43% higher risk of fractures overall compared to nonvegetarians, as well as higher risks of hip, leg, and vertebral fractures."
The study used a single baseline questionnaire for 17+ years and looked at vegans with correctable nutrition deficiencies to see +15/1000 hip fractures over 10 years. I'll grant that a poorly planned diet, especially 30 years ago with less nutritional understanding, has worse health outcomes. Just like I wouldn't use the average American's diet to lambast an omnivore diet (compared to, say, the "Mediterranean" diet).
> "vegans had lower iodine, bone health scores" (RBVD study)
On bones: p=0.02 in 72 people with 5% less QUS score in their heel bone (not DXA nor bone density tested). No body weight mediation nor data about health outcomes like fractures, osteoporosis, and no time dimension since it was just a snapshot (cross-sectional).
On iodine: It's a surrogate biomarker from a single pee test. Study didn't look at iodine-related health outcomes like thyroid dysfunction, goiter, or clinical consequences.
Protip: in the olden days we used to be able to read and write code ourselves. Worth trying while Claude is down! You might have fun and learn something!
Imagine a future where Anthropic holds your company hostage because no one can code properly anymore by hand and demands paying 200% higher price for the usage.
Developers who can code without LLMs will go extinct in couple years and there will be legends about them, you should at least have some decent open weight model as a backup
There is something to be said for how the technology stack keeps growing for businesses and what this might mean for the future.
Thirty years ago, you had an OS and you installed applications. No problem.
Later, you had to build and use apps on the internet, an infrastructure that is susceptible to DDOS attacks, government firewalls, and other security risks. Still fine, sort of.
Now, you not only have to build apps on the internet, you also have use LLMs to build apps to remain competitive with other developers. Future (human) maintainers of your code might not properly understand how it works, and if the providers of the LLMs screw up or go rogue, you are properly fucked.
There is a dependency/technology stack debt that is creating risks that need to be acknowledged.
I would guess that they would want to at the very least 10x their prices. Remember they need to make up for training, marketing, etc.. and make a big chunk of profit on top of that to justify their trillion dollar evaluation
No need to make up speculative futures based on a company only giving one model to their employees. I use Codex, Antigravity, Claude and GLM-5.2 interchangeably. Any sensible employer will do the same.
400k isn’t crazy for the FANG set but it’s still a subset of the developer market and hundreds of thousands of those jobs have been cut in the last few years as they all collectively work to lower SWE pay.
60k a year it needs to be a full irreplaceable part of the infrastructure for I think. There are very few kinds of software that meet that bar right now (certain design tools etc that have no replacement). 12k/year is in the expensive but reasonable for the right tooling category (Matlab etc.).
I don’t know what the future holds. I know the big AI companies are banking on being able to charge for a replacement SWE that works 24/7. Still not convinced these are it yet, as useful as they can be under the right circumstances.
Incredible how we can claim productivity increases when its either Claude or Github shitting the bed every other day. It must even itself out to a net neutral gain in the long term.
We are back to the baseline. The availability of our tools isn't adding anything in the long term because the productivity increase we get from the tooling is negated by the time we're back to doing it the old fashioned way due to downtime, so there is no claimed productivity increase espoused by the pontificators of the tooling.
The bunch of MD files in the codebase is becoming "tech" debt. It's just English prose, sure, but thousands of lines of English prose. Terse. Succinct. Difficult (if not impossible) to maintain manually without LLMs. That's not "baseline"
Both articles use 2017 as the turning point date. TTS is a lot older than that. It's not difficult to find data to fit the desired point if you choose a narrow enough time range. Or location selectivity - both of those are just about the United States.
When that downtime happens is way more important than the amount of it. Imagine if your payroll system was down for 8 hours a month, but it just so happened to be the day payroll do their calculations?
Totally. The uptime metrics are deceiving imo. A more useful measure for a productivity tool like Claude Code is uptime during work hours for a given time zone. I strongly suspect at least for the three US time zones, we would be looking at a single nine of uptime for that measure.
Works out at even more days when you consider working hours. And these downtime events never happen when I'm sleeping, always smack in the middle of the afternoon when I'm working.
Business/Office productivity tools can be productive at that rate. Core systems like ERP or arguably CRM can't, but MS Teams is probably already that low, Figma, Canva and several others could absolutely afford to be one nine before it affects their churn materially. I suspect OpenAI and Anthropic make most of their profit on business use cases rather than dev use cases (likely higher revenue but less profit) so this may be what sets the standard of uptime.
Hey you. Touch grass. Go outside. If a minor downtime of a developer tool triggers you, it means you likely have heavy anxiety. Don’t worry about it and calm down.
Anthropic has massive capability issues due to massive user growth. It happens often when EU and US work hours collide. They have smart people working on it. Don’t waste your energy complaining.
I really wish people wouldn't pretend these actually matter compared to, say, the proliferation of personal internal combustion cars, or shipping using bunker fuel.
I'm perfectly capable of programming without internet or AI but I would admit it would take longer and in the modern world we live in it's often not economical to do so. After programming for over 20 years you start to get in that flow automatically at least you used to do so. I don't know if people starting out to program will be able to, but most experienced developers will feel this way I assume.
> I'm perfectly capable of programming without internet
Me too, but let's be honest, I'm not talking about "Hello world!" experiments, I'm talking about developing usable software. I'm pretty sure, you won't be patching a Linux kernel driver on your own machine without googling stuff.
I've learned to code years before the Internet, but we've had it for so long, I'm honestly not sure anymore if I'm truly capable of building [real] stuff while offline. And I can't just ignore it, there's a feeling now, that with AI advancements, I may soon no longer be able to code efficiently without any AI.
It depends on the language but agreed. If I didn't have internet or AI access, I'd still be able to pull out manpages or dig into source code.
I wouldn't like it and it'd be slower, but I still understand my environment in sufficient depth to work without external info if I absolutely have to. Even with AI, once in a while I ask it to just give me some hints instead of solving something for me, so I'm forced to do the work.
It would be hilarious if they don't know how to fix it because this was built by "running loops calling Claude" and they haven't the faintest idea of the present underlying architecture.
I request an official statement from Anthropic explaining how they're going to limit outages in the future. Elevated errors almost always means its down for me and I can't be that unlucky statistically speaking. It seems that Anthropic does not have a good grip on the ops side of things.
I suppose it's a good time to encourage people trying out pi[1] with any cheap model from the openrouter rankings page[1].
[1] https://pi.dev/ [2] https://openrouter.ai/rankings
With Agent Client Protocol (ACP) you can keep the same UI and switch not models, but entire agents, that means using tools/prompts/compaction/etc that are tailored for the model.
Try Zed[1] for GUI and pool[2] for TUI.
[1] https://zed.dev/
[2] https://github.com/poolsideai/pool
Linked your own project with an "All rights reserved" license? The only thing my company will allow me to do with that software is have AI steal it </s>
We don't have any client-side telemetry. Conversations with Poolside models are stored, but you can use any ACP agent with pool. And we have plans to open-source it eventually.
https://pi.dev/models is throwing an internal server error for me.
I just did a build in Nemesis8 (containerized agents) and Pi appears to be working fine. Opencode is a good choice too if you're interested in checking out GLM 5.2 from z.ai.
https://github.com/deepbluedynamics/nemesis8
Is pi better than opencode?
I haven't tried opencode, but when I opened pi I was able to complain about that silly and stupid left-padding that LLM TUIs have started using that prevents basic copy-paste operation, and pi was able to edit itself to fix it.
So I'm sold on that level alone. Good stuff.
They are different models. OpenCode is trying to be a claude code/codex replacement, where-as pi is something you build yourself, kind of trying to be an emacs type thing compared to vs-code. As in emacs it is more common to write your own extensions, where as in vs-code most people just download them.
I keep butting into the question of; why opencode, when you've got codex available? Codex is open source as well, and i can't seem to picture a situation where one would want Opencode over Codex.
As far as I can tell, they tick the same boxes- but one has the support of a big boy model provider.
Well, the reason is simple: over the past several months, it has become very difficult to use Codex with non-OpenAI models. They removed the old edit tool that didn't require OpenAI's free form tool calling (that no other LLM host supports), they are adding tools to every request of a type that break most LLM hosts unless you use a proxy to filter them out, they add a "developer" role to some messages which breaks some chat templates, etc.
If someone wanted to fork Codex and make a community-maintained version that supports third party models, that would be great, because I liked Codex better than OpenCode for the most part.
Maybe you've found workarounds. Maybe you're using an old version of Codex. Maybe you have your own soft fork. I don't know. But I used to be able to use Codex with self-hosted models, and I gave up on that about a month ago as they kept breaking that.
Ah, I wasn't aware things regressed there. Yea certainly workarounds n soft fork sorts of things definitely would work- but thats a bummer than things have changed.
From watching Pr's and issues- seems like openai at least wants to come across as if theyre supporting non-oai models :/
Yeah... one of the relevant issues: https://github.com/openai/codex/issues/11940#issuecomment-45...
You would think they would support their own GPT-OSS model, but, not really anymore. I wish they would release a GPT-OSS 2, but this doesn't fill me with confidence.
If you care about privacy at all, you can route your Opencode requests through an inference provider that does not retain any logs or data. It is also much cheaper. So if your boxes include `Privacy` and `Affordability`, then no, they don't tick the same boxes.
You can use the Codex harness with non-openai providers if you want.
Pretty sure you need to use an older version of Codex for this to work.
I think they meant using Codex with non-openai providers?
oh-my-pi is a bit of a cross between the two; comes with basically everything OpenCode does, but still easy to customise.
OpenCode is nice if you don't want to do a lot of research and just want to get started right away. The OpenCode Go plan for $5 a month for your first month is a great way to do this, with good models to choose from and reasonable usage limits for a beginner.
I use Go plan precisely with Opencode IDE (and also Jetbrains IDE suite), but now also have access Gemini Pro and Claude Pro. And wonder which tooling to invest my time into, especially that MCP servers also potentially come into play here, and I want at least some models/tools to handle private tasks, like handling my increasingly-complex Home Assistant setup. And I also want to start using models according to needs (plan, execution, reviews). This shit gets extremely complicated extremely quickly, not to mention how often this field shifts direction.
I use “all of them”. My primary harness is oh-my-pi. I probably use 10 different models on a regular basis.
I occasionally use OpenCode.
I try to use Codex and Antigravity as much as I can, often using it as a secondary agent (due to different usage pricing models than API). The same skills and MCPs work across harnesses.
Edit: I don’t use Claude Code simply because I already have enough to deal with and don’t see a major advantage to their harness. I use Opus credits from my Google subscription on the rare occasion I need them.
Cursor is also worth checking out particularly at the $20 a month price tier. If you have Grok you effectively already have it too.
I expect to have a completely different answer a year from now. The main “lift” we’ve gotten from AI tools is our clients now get an Android + iOS app + macOS app + Electron + PWA to go with whatever web based app they want us to build, at essentially the same original price. (There’s also a CLI and a TUI, but so far none of them care about that…)
We just made the decision to start adding MCPs to apps. Gonna be an interesting conversation in a few weeks when I can tell my business contact he can use his favourite chatbot to now plug in directly to the custom app he bought from me.
Nice, thanks for the write-up!
I like it.
One caveat is that it doesn't do MCP tools, but can wire them up with bash (or use CLIs if those are available).
I can vouch for ohmypi, it's quite good out of the box and works great with your codex subscription or openrouter or fireworks etc. Very good harness.
https://omp.sh/
website is super laggy and has low FPS
Except I was having connection issue and errors through open router too
"curl -fsSL https://pi.dev/install.sh | sh" — seriously? That tells me a lot about the whole project, unfortunately.
I am genuinely curious what it tells you, as "curl https//.. | sh" has long been an enormously popular approach to distribution in the open source world. Homebrew, to name just one example, advertises a similar method.
(pi.sh also documents other install methods, like `npm`, on their homepage)
If trust and security is the issue, unfortunately "better" ideas like hashpipe [1] never achieved critical mass
I really hate the `curl <url> | sh` specifically because if your connection drops at a specifically unlucky point in time you are left with a partially executed script which if you are unlucky enough may just have been executing `rm -r ~/.cache/<pkg>/download` but it stopped at `rm-r ~/`.
Is it likely? No. Can it happen? Yea.
Just make it `curl -o <file> <url> && sh <file>` and this entire problem is gone.
Most scripts now put all the code into a shell function and call it in the last line of the script, so this bug can't happen.
Correct, and/or in addition, most nowadays prepend something like `set -euo pipefail` to the scripts in the line immediately after the shebang which results in stopping on errors, including things such as syntax errors stemming from e.g. incomplete installer transmission over wire.
(At least for bash scripts, I’m not sure whether these are POSIX syntax to be frank.)
What about better ideas like installing from source, or using a package manager? Or even flatpaks.
From source: creates much more work for the user.
Package managers: ecosystem is fragmented, requiring a long list of distro- and package-manager-specific instructions. Many scripts already install through package managers, they simply make the user’s life easier.
Flatpaks: These are clearly designed for desktop applications, with CLIs treated as an afterthought. They may be the best long-term hope, but today they are definitely not as convenient or widely available as a simple script.
If you care about adoption, `curl | sh` is the only real option today, which is why virtually all project show it as the first option.
Bullshit.
There's plenty of big projects that don't suggest you curl a script right into your shell.
If you have curl, you're probably on Linux. Just use the package manager like an adult.
The "like an adult" is what has and will continue to hold back linux on the desktop. Always gatekeeping less technical users instead of acknowledging adoption and ease of use are critical.
i dunno, nothing about most computing is particularly easy to use or intuitive.
what has worked over time is having computers of various types in schools, where teachers teach students and let them play with it.
nobody teaches about the command line, so nobody knows what to do with it. its also inscrutible without a useable help view, unless you already know how to use the terminal
Windows, macOS, iOS, and Android are definitely much easier to use and more intuitive than Linux today. That’s because their developers are incentivized to put themselves in the shoes of less-skilled users and figure out how to build a good experience for them.
I’m all for higher Linux adoption on desktop, but there’s still a lot of resistance to making less-skilled users the primary target instead of power users.
Teaching can help, but if it takes 50 hours to learn the basics of Linux versus 5 hours for Windows, it’s a losing battle.
Is this stance gate keeping users? Isn't a pkg manager installation also a one liner? This seems more like gate keeping lazy distributors.
A lot of those scripts are wrappers around package managers. Creating them is extra work for distributors, but they still do it because package-manager installs are not truly one-liners and offer far less control over the installation experience.
Users need to figure out which of the 10+ package managers they should be using, then run several commands. If something fails, the error messages are often cryptic and not easily configurable by the distributor.
And that’s before getting into the many rough edges of package managers. Most of them flat-out refuse to handle configuration and leave that part to the end user. Now you also need to document how to edit YAML and restart a systemd service. With an install script this is also solved.
For power users, this always looks trivial. In practice it raises the barrier to entry and can meaningfully affect adoption if your product is often used by less technical people.
Your arguments do not make even a little sense.
In what world does a user have to choose between 10 package managers? Each distro has exactly one. There are also only about three, maybe four main package managers out there.
A shell script being piped into bash has so many more ways to break than a package. And if yhe theory is that package managers are fickle (they aren't), then how does adding more complexity help?
It is much simpler, much safer, and easier to maintain a package than an install.sh, eapecially for a big project.
Configuration can be handled by a script, yes. Here's a crazy idea: Your package can include scripts for configuring the software. It's almost as if most packages do. The scripts/utilities could even restart a systemd service for you.
Unless you're talking about configuring your build, in which case we're dealing with an experienced developer who will have no trouble just cloning the repo and building from source.
My biggest issue is: if we're dealing with someone who can't use a package manager, we're dealing with someone who doesn't have the capacity to judge how safe a script downloaded off the internet is. This does not drive linux adoption, it drives botnet adoption.
It's crazy to me that even after seeing so many major software distributors choose `curl | sh` as their entry point, people like you will still argue to the ends of the earth that there’s no problem with the package manager ecosystem.
I'll stop there. I'm not interested in continuing this discussion when it's being conducted in bad faith.
It's about trust and having an official account for packaging on each platform where my customers getting their software from.
Most official repositories have policies that are incompatible with the needs of software vendors (release timing, supported versions, bundled dependencies, etc...).
IMO a lot of the blame falls onto the package manager ecosystem refusing to take into account very valid needs and claiming they aren't real / desirable.
The ideas aren't mutually exclusive, and I've never seen an open source project support "curl | sh" without also supporting those methods.
Indeed, plenty of these scripts often act as a "what OS and packager do we have" mux. Just look at the source of this one, for example.
When you support an open source project at scale and/or with less savvy users, you come to see the benefit of "here, just f'ing slam this into your shell and we'll figure it out" installers. I know I have.
> I am genuinely curious what it tells you, as "curl https//.. | sh" has long been an enormously popular approach to distribution in the open source world.
It's plain horrible. You could have, for example, a compromised server serving malware but only one out of every 100 download. The only signature you rely on is TLS.
Proper package distribution are using proper signatures schemes, are decentralized, even for some offer reproducible builds (meaning you can rebuild the whole package yourself and verify your build matches), etc.
Hashpipe is an attempt at reproducing some of those guarantees. Not unlike container pining using hashes. It at least fixes the "Jack and John installed this already and I know I'm getting the same version as they did".
Proper software distribution is signed, reproducible and ideally also uses some proof-of-existence for the hashes.
My bet is this: in the face of the countless supply chain attacks, we'll see more and more people getting very serious about security, including the security of software distribution. And curl bash'ing won't be part of it.
Claude Code does it the same way (which doesn't excuse it obviously) but still.
curl -fsSL https://claude.ai/install.sh | bash
https://code.claude.com/docs/en/quickstart
Yep, that's not an excuse. Claude goes down all the time, should pi also go down?
Oh wait (from another comment under this article): > https://pi.dev/models is throwing an internal server error for me.
Seriously, what is the threat model here?
There is no threat model that doesn't also apply to pretty much every other distribution method.
It's just people who have internalized "don't paste commands from the Internet into your terminal" and aren't thinking about exactly what makes pasting commands from the Internet into your terminal dangerous, and how that applies to this specific case.
Nah bro package manager where you copy and paste their custom repo and key from the same website that hosts the `.sh` is definitely safer, trust me
/s
it tells you they're just like basically every other CLI targeting project for the last 15 years? I mean is it a big security hole we all accept, yes, it is. But it's not really indicative of much. That's also how I install rust.
We also accepted the security risks of npm and such and we get one supply chain attack after another.
Maybe security should be at a higher position on our priority list.
The careless days are ultimately over but we still don’t act like that.
I get this, and would recently have had a similar reaction. But I have to ask: do you typically run your agent harness in yolo mode?
Yeah, totally reasonable comment given the utter security that must come from anthropic with their installer, amiright?
oh wait...
"curl -fsSL https://claude.ai/install.sh | bash"
(right from https://claude.com/product/claude-code)
Further - what the flicking fuck do you think an installer is going to do on your system? Not run any commands? Because I've written installers for every platform... they ALL can run commands.
So what exactly is the complaint in this comment? If you want to go read the install script - knock yourself out (or hell, point your agent at it...).
And you can simply look at the installer by pulling it up in the browser.
You can simply look at the installer by leaving off the "| bash".
both the Julia and Rust programming languages use curl -> sh to install
Both of them provide that option. I've never installed rust without a package manager. Why would I?
> Why would I?
Because then you can install it without depending on a package manager?
Yeah, from source in that case. Or using a verified binary if I absolutely had to.
Yes, if you want to, you can do that.
Understand that 99% are comfortable trusting downloads. They know that it's just as easy to sneak backdoors into source code as it is to sneak backdoors into executables.
See also: XZ hack.
99% of developers are most definitely not comfortable piping a script into the shell.
I would never runa script without reviewing it. I would install a package from a distros repository without reviewing the contents, however, because I can trust that a distro maintainer has reviewed it, that anyone else in the community can review it, and that that the bytes I'm downloading are the specific bytes I'm supposed to be downloading.
If you run a script off the open internet, you're being massively irresponsible. There are so many attack vectors that could be used here, and they are much easier to implement than something like the massive social engineering attack that was XZ.
My dude - if you're going to trust them then you're going to trust them.
You think it's hard to obfuscate shell calls from inside a built executable?
What it tells us is that you're probably searching for reasons to grouse about AI.
In general I agree with you, but on the other hand it is an agentic coding agent you should have isolated in a container or VM anyway
I have been developing software since the late 80s, mostly CAM software for metal cutting machines, and I have been refereeing tabletop roleplaying games like Dungeons & Dragons since the late 70s.
I get the power of LLMs, and I do find them useful. But I find them useful in much the same way I find a really good set of random tables useful, or a good set of rules for procedurally generating something like a star sector for a science fiction campaign.
For my day job developing software, and for the RPG campaigns and books I run and publish today, LLMs are, in many cases, random tables on steroids. After using them for two years, even with all their improvements, I am continually reminded by the results I get that, at the heart of it, I am still dealing with what amounts to randomly generated content.
Yes, I know it is more accurate to call the process probabilistic rather than random. And yes, somebody can construct a technically deterministic setup with fixed weights, fixed seeds, fixed sampling parameters, and a frozen runtime environment. But that is like saying you can recreate a rainstorm if you get a thousand butterflies to flap their wings in exactly the right way. It may be technically true, but it is not how the technology behaves in normal day-to-day use.
For practical purposes, given the same prompt and the same apparent starting conditions, the result can differ each time you use a model. The outputs will often be highly correlated, and often useful, but they are not deterministic software in the ordinary sense.
So far, I am failing to see how the inherent probabilistic nature of the technology can be fully overcome. I understand how we got to where we are today from older neural net technology, including the systems used for vision and sound. What we have now can be very useful. But my view is that it is being badly oversold and overhyped. Its probabilistic nature is being vastly underestimated, and that is a major reason for much of the weirdness and many of the failures we keep seeing.
In tabletop roleplaying, there have been times when hobbyists relied too much on procedurally generated content and ultimately got burned by it, either through campaigns that were not as fun or products that were subpar. Each time, the lesson was the same: there is no substitute for human judgment.
Any workflow or technology incorporating LLMs has to keep humans in the loop, and not merely as rubber stamps. The human has to remain the primary decision maker.
> So far, I am failing to see how the inherent probabilistic nature of the technology can be fully overcome
I deeply hope we never reach the point where that’s overcome. What we’ve seen over the past few years is how AI will destroy humanness from pretty much the entire digital realm. It’s by far the most evil, anti-human technology ever created, corrupting everything it touches. The last thing we need is for it to become reliable
The trouble is: there is no deterministic algorithm that can do the things neural networks can do.
For many of these problems, I think it is likely that no deterministic algorithm can exist because the problems are fundamentally underspecified. E.g. a common task in computer vision is generating a 3D depth map from a 2D image. This is inverting a lossy projection, so any solution must be a least partially a hallucination.
I think we just have to accept this. It's a different type of algorithm, built out of statistics instead of logic, with different strengths and weaknesses compared to traditional software.
I feel the same way. Analogy: we’ve been geologists this whole time, building our dynamic and interesting mechanical planet.
Now, biology exists. It’s wet and messy and impossible to understand (we haven’t invented the microscope yet). That doesn’t mean biological study is not worth doing.
> I am failing to see how the inherent probabilistic nature of the technology can be fully overcome.
This is common in image generation pipelines because if you find an image you really like, you can store the seed and then reproduce it with small tweaks, otherwise - to quote Borges - “Look at it well. You will never see it again" User-facing deterministic pipelines do exist for generative AI.
I know you make this argument in your post, but that's really the answer if you want repeatable results. For a classifier or a detector, determinism is a requirement, but for an LLM non-determinism desirable property because it feels like a more natural conversation. The downside is it's extremely difficult to replicate a response without pointing the model to an earlier conversation.
And specifically for the RPG case, don't you want non-determinism? You don't want the model spinning up the same identical person if you say "Generate me an NPC character sheet for an innkeeper". This was a complaint that people had in the past, that models would regurgitate the same scenarios or the same jokes.
Where I suspect DMs run into trouble is not randomness, but lack of self-consistency in worldbuilding. Say you generate an NPC and then refer back to them later and the model gets some details wrong. You could compare to a system like Dwarf Fortress where everything down to the genealogy and faction relationships are rigidly generated.
Setting aside that we're living in a universe that's full of (practically) deterministic processes built over probabilistic components (and which behave sufficiently reliably without any human in the loop), I think the specific failure mode you're citing is that there aren't enough gates and constraints applied to the processes you've seen.
LLMs can contribute quite reliably given very narrow prompts and short horizons (keeping turns low and context brief). If you chain a bunch of these narrow contributions together and define guardrails (structured outputs, online evals, other-llm-as-judge/jury, etc...) you can produce a very repeatable workflow that reliably delivers to defined service levels.
The obvious issue being - you've got to define the workflow and implement all the guardrails, not hope that the LLM will infer them during a session or a one-shot prompt.
I think we need to disqualify humans as well. Their brains have been shown to operate on probabilistic chemical interactions and even quantum effects.
That doesn’t disqualify humans. It highlights the difference I am talking about.
Those chemical interactions and quantum effects lead to emergent properties like judgment, experience, context, accountability, and an understanding of consequences. Those are not properties that LLMs possess, regardless of how useful their output can be.
That is not to say that, in the future, LLMs won’t be used as part of other systems that add some of those properties. But that is not what we have today, or what can be seen in the foreseeable near future.
> Those are not properties that LLMs possess, regardless of how useful their output can be.
What makes you say that? Other than the usual "I'm a human, and humans must be very special, so when something that's not a human does X, it's either not real X, or X wasn't important in the first place".
It highlights, in my eyes, that "critical flaws" of LLMs are the same exact flaws that humans routinely suffer from. Sometimes LLMs have it worse, but sometimes they have it better too.
LLMs do improve release to release though. Humans are more of a mixed bag.
My understanding is that the quantum effects has 0 impact, see https://en.wikipedia.org/wiki/Orchestrated_objective_reducti.... It's currently fringe/unproven science.
> Be Anthropic
> Optimize your bottom line for token spending so you collect $$$$
> Release Ultracode feature that optimize for Token Spending (a.k.a Dynamic Workflows)
> Tokenmaxxing achieved + 529 Overloaded unsustainable APIs everywhere
Actual 90d uptime: 97.6838% (calculated by Codex from live data)
So, only one 9 for 10x vibes.I want uptime modulo in my timezone/work hours. I don't give a shit about any 9's earned while I'm sleeping.
Thanks, never thought about that. Definitely makes sense for situations where you don't have 24/7 requirement for a service.
These stupid SLAs in the SaaS contracts should be reframed like this: 99.9% during working hours, not 99.9% overall. Would also give the SaaS vendor leeway to only guarantee 90% availability outside working hours, and then do their maintenance tasks in those windows.
Sleeping less nowadays...
It started failing two days ago, when it suddenly couldn't access gmail threads reliably. Then it started popping up warnings that I was over quota when I wasn't. It even let me use Fable briefly, or pretended to. Meanwhile search finally started working, so there's that.
This video, wow: https://www.threads.com/@founder__growth/post/DZz_9Ikj3Wx
Out of desperation, I moved to ChatGPT and it's working better than I remember. All these companies are playing games under load, under failure. No wonder we can't agree on what's good for what.
I had to log in to github and review a PR by hand just now. I felt like a savage again!
status.claude.com looks like a holiday christmas ornaments
Our team calls this "full three pepper blend" ;)
https://anthropicisdown.com/
Looks like Mike and Ike candy.
The ridiculous marketing message about their oh-so-good-we-cant-release-them models is just the cherry on top.
Speaking of LLM looping techniques, Claude seems to be having elevated error rate on a /loop as well.
So can we hire back those Oracle workers to write some code now?
The rainbow has to keep being a rainbow.
ClaudeCode still has a 99.27 % uptime
ClaudeCowork has 99.52 % uptime
ClaudeForGovernment has 99.93 % uptime
I must be unlucky because I'm in that .73% way more than .73% of the time.
That's how outages on very popular systems work - there's no downtime when most engineers are sleeping, since they're not under load.
A lot of deployments are also being done by engineers, performing the deploys while awake. Thus increasing the risk of outages while I'm awake. :(
There was a post on here the other day explaining this exact phenomenon: https://brooker.co.za/blog/2026/06/19/waiting.html
I signed up for paid plan on Claude just 3 hours ago for the first time and was scratching my head on how that thing gets praised so much if I can't even send a question half of the time....
That's just exceptionally unfortunate timing. Anthropic has been getting better at uptime, but they still have the occasional issue.
Yeah it's one of those situations in which you reluctantly check for downtime as a last resort, only to find out you indeed just had a bad luck. Which is good, because I thought it was a beginner's brain type of friction.
Wonder if in the future that public holidays will = AI services being turned off by gov. killswitch, to encourage people to actually take time off.
If the government wanted people to take holidays off they could just legislate that people can't work on those days. I doubt there is any political will to do this, though.
Perhaps I am taking this idea a bit too seriously but I imagine that it might not work because of VPN's.
but VPN's can be detected and perhaps already are by these AI companies and it can lead to the ban/restriction of an account so not many people would prefer to use VPN.
Then, theoretically speaking, I suppose that it might be possible to perhaps toggle off these AI companies for enterprises or licenses of dev's
Though I imagine that it would mean taking an ID and having a special dev tag so as to not remove the general purpose chat bots that these sites still operate.
I do imagine that it might be really interesting to have a single day where AI esp closed source is/are turned off and see how that pans out but looks like till then claude is sprinkling its downtime throughout any part of the day/month randomly with their downtimes.
Getting consistent "API Error: 500 Internal server error" messages in Claude Code right now (10:20 AM EST)
Their completion endpoint[*] is returning 503 with a `fault filter abort` response
[*] https://claude.ai/api/organizations/<ORG_ID>/chat_conversations/<CONV_ID>/completion
for me it's
API Error: 529 Overloaded. This is a server-side issue, usually temporary — try again in a moment. If it persists, check https://status.claude.com.
529 and not 429?
429 is if you have been rate limited. 529 is for server overloaded and can't process more.
Has anyone noticed how changing the viewport changes the uptime percentage?
It's changing the number of days it's looking back from 90 days to 60 days on smaller viewports - the uptime reflects that.
They dynamically pick the number of days to display based on viewport size. Mobile = 30 days, tablet = 60 days etc.
I have two sessions going. One is fine, one keeps timing out. Both Opus 4.8 in Claude code in terminal. Must have them routed to different to different infra that isn’t equally impacted.
What a coincidence, OpenAI is also down according to Downdetector.
They claim to be fully resolved as of half an hour ago, but it's still not working for me.
Restarted my claude session, by killing my terminal, it worked
Is it that time again?
My claude status teams webhook says unicode character U+274C , usually on downtimes we get a U+1F7E1... let's see how this goes
Since this keeps happening often enough not to bring up that much new discussion...
Today is the Latvian holiday of Jāņi, to mark the passage of the summer solstice: https://en.wikipedia.org/wiki/J%C4%81%C5%86i
Grab yourselves some beer or beverage of choice and some cheese (we usually have caraway cheese), alongside skewered meat and get some rest!
I mean, what else am I going to do while Claude is down, write code manually, like they did in the 90s or something?
Good opportunity to do some planning work.
I do that with Opus in plan mode..
Mine went down mid-session and it just shows a JSON error lol, waiting for it to come back up to continue..
I always associate downtime like this with a new model rollout. Maybe we are getting Fable back.
I was going to say the modern day equivalent of Github is down, but it's always down.
Perhaps they are adding security controls to bring Fable back online? One can hope.
Th is really not good advertisement for Claude-Oriented Programming
Good thing we have GLM-5.2
And we're back. Nothing to see here...
Ohno we're not. Hokey Cokey time. 529 overloaded. Of course. Maybe a beer. It is hot after all.
Good break, time to catch up with the code
saw this comment on Reddit,
"it's look like when the lights turning off, we return to socialize lol"
Is there any indication these errors are related to Anthropic-written code as opposed to operational issues from the fastest-growing infra buildout ever?
Layer-wise, the app is pretty far removed from request routing to GPU pools.
This is almost certainly a software issue, though. Even if it's due to scaling, they still built a system that failed catastrophically rather than degrading gracefully.
Sure. But could it be k8s config? Could it be Nvidia Bright Cluster? Could it be load balancing?
I'm not saying Anthropic isn't to blame for a system that is literally approaching one-nine uptime; they certainly are. I am saying that jumping to the "it must be vibe coding's fault" is an emotional confirmation-bias belief, not an evidence-based belief.
I'd expect that they're also managing their k8s config and other infra using LLMs (it's actually quite good at this, at least for my simple homelab use-cases).
> failed catastrophically rather than degrading gracefully
You mean like returning 529s and operating with reduced QoS?
Right. If this were truly a pure scaling issue, I’d expect the interface would offer an archive.is-esque “Claude is at capacity; your prompt is #XXX/YYY in the queue; estimated time remaining: ZZZ seconds”
Instead, the whole system just shits the bed, catastrophically.
But such messages would suggest that Claude has engineered limits, which isn't what the market wants to hear. Completely falling over and being unavailable is just another Tuesday on the internet, will be forgotten by the weekend.
> being unavailable is just another Tuesday on the internet, will be forgotten by the weekend.
This is true when you have like one failure a year, but Anthropic is starting to look a lot like github lately when it comes to uptime.
After a certain point the reputation for unreliability starts sticking to you, especially when you position yourself as an indispensable tool for completing work people need done.
Yeah to those of us who are on the know, but maybe it can be spun as "Claude adoption breaks the internet" to the consumer
Except these models are not run prompt-to-prompt. The infra has to hold the entire context.
I'm not sure if that's really an Anthropic problem you're pointing to vs a problem that their infra layer handles (Amazon, Google, whatever hyperscaler). i.e, they might be scaling quickly but they are running on top of established infrastructure.
I wonder how they fix things when Claude is down.
I would bet that they have inference setup for internal use on a separate system from the customer-facing production environment. The same way telemetry infrastructure needs to be run separate from normal production systems, so you aren't "blind" when you need it most.
Based on this outage: not very well.
This is ( or will be in the future ) a surprisingly relevant issue
maybe they ask a secondary agentic system to fix it. will that be the future of “redundancy”?
"Gemini, fix my Claude infra"
lol probably use their dev or qa Claude environment to fix prod
On the other hand we are also willing to buy it, so reliability is arguably not as valued a good as people assumed.
Some of us are unsubscribing, what with the coming face scans/enshittification/downtime/throttling...
He is a salesman at this point and is not talking to you. He is talking to the investors who want to vibe code loops to waste tokens on building slop to get rid of you.
Goes to show how fake this industry has become when VC dollars have flooded it.
Somehow it is fine to vibe code infrastructure or security because someone (with a clear vested interest) wants you to spend more tokens at their casino because that is how they "win" at the casino (which they work at).
Except in reality, this part of software is critical and irresponsible to 'write loops" and we all know that he doesn't believe what he is saying.
It's very very clear they're eating their own dog food, in a product space built on tech that didn't really exist publicly 5 years ago, to the success of billions, that people increasingly depend on. Maybe I'm an optimist, but I can't fathom the intense negativity or perspective of failure here.
Don't use it. Maybe wait a few more years. If it's not valuable/useful, then not using it, while everything matures, will not be a problem.
> If even Anthropic, the company with the world's best agentic vibecoders...
But that's really not what they have. They have AI experts who are creating incredible LLMs.
Everything else is more than meh: Claude Code is really bad. Such a turd would never have gained any traction if it wasn't for the LLMs behind it.
I use LLMs to code daily (Claude Code still, mind you, for I didn't take the time to switch yet) and these modesl are both amazing and pathetic.
If you don't verify everything they output, they do the absolute craziest thing imaginable.
One example is I got an Anthropic model notice a "pattern" in range bound integer values. I had them range bound between, e.g., 0xCAFE0000 and 0xCAFEFFFF. And at some point a comparison/validation was needed and instead of doing an integer comparison the Anthropic model went ballistic: instead of doing an integer comparison it converted the numbers to a string, then started doing substring matching on "0xCAFE" and went even more "expert" by verifying at which position the match was happening. All that while explaining why it couldn't possibly fail.
Why did it do that? Very likely because, in a comment, it saw "0xCAFE..." as a string. And the thing saw a pattern.
Can you believe it? There's a pattern. So it must light up connections. We've got a pattern!
Now amount of kludge, hidden pre-processing, hidden post-processing is fixing the "quality" of the code produced by something that, instead of doing an integer comparison, converts things to string and then does substring searches and indexes computation.
There's no fixing that.
Yesterday: had to use three guard clauses before pushing data... Two of the three "logic gates" (as the model would explain they were, which is kinda right) he got right. The third one: same thing... It was planning to go ballistic, introduce countless lines of code, insane abstractions, to make a test that was solved with a one line timestamp comparison.
It's because it does things like that that the people who explain that they don't code anymore are delusional if they think this gives, as of today, quality code.
It's like that other dude who was happy to produce 37 K LOC per day and counting.
> ... it really says something about the quality of the world's best agentically produced code
Oh it is totally shit code. But if you monitor everything and vet everything they do, it's helpful.
I find these LLMs way more helpful at finding the source of bugs (not fixing them: finding them, which is 90% of the job anyway) and at acting like rubber-ducks then at writing code.
Claude Code sucks. Claude Code CLI sucks. Their only "solutions" to all problems is to create VMs, headless browsers, and resort to incredible hacks (the infamous "game loop" that modifies the characters output by the LLM is just shameful) etc. to try to hide the misery. It's miserable kludges everywhere.
And the only reason these miserable kludges are not entirely falling apart is because they rest on the shoulders of actual giants: projects like Linux, QEMU, etc. that were not vibe-coded.
It's sad to have useful tools (the models) and to make such poor use of them.
I'm pretty sure that, in the end, it's just like open-source powering the entire world by now: we'll have open-source projects like Pi and then newer ones that are going to come out and fix the mess we have now. And they're not going to be 100% vibe-coded by people whose jobs is "to write loops".
Meh, this is the "must be the veganism" fallacy: if someone knows you're vegan, then any ailment you might have, no matter how ubiquitous in the population, must be somehow due to your vegan diet and no more details are required.
Except now it's the "AI did it" fallacy where if you know a company uses AI, even infra scaling issues must be due to AI, and if you had just used less or no AI, you would have been spared even though that has never been true.
The usual response to this goes something like "well they made claims that AI is good" therefore anything short of perfection supposedly debunks the claim.
This is not like that.
This is literally they saying they are letting their LLM run wild(ish) and seeing the status.claude.com we can see the result.
This is a case where the outcome is the direct result of the engineering practices like the ones they describe.
PS: Yes I use Claude, Coded, Amp and Cursor agents every day so I am not saying here LLMs are not valuable.
LE: They did not made claims that "AI is good" they made claims that developers/computer engineers are not needed anymore in the near future. Thats is a stronger claim and has a direct relation with a product they have which needs computer engineering (yes infra counts too) and which seems to be down more than we expect as a good quality bar.
You just said "it's not the 'must be veganism' thing, it's the 'must be veganism thing'"
Unless you have inside knowledge of their infra ops and management tools, it is just guessing and blaming veganism. For all we know it could be tools from Nvidia or anyone else failing under massive load.
It could be the veganism. Some things are. Leaping to it as the only possible explanation for every ailment is exactly the fallacy.
No. We dont need metaphors like that with veganism (which touches ideologies also) when talking about engineering and a company that promotes out loud that engineering is done.
I have not stated anything. I just replied to a metaphor which is not needed cause here we talk about engineering problems handled by engineers in a tech company. I give you something else where this line of thought could be wrong: culture beats (and destroys) engineering practices unless regulated by law. In this case yes this is not because of LLMs but because of company culture.
Still hard to know where the line draws because Anthropic talks about solving computer science for good as in humans need not apply.
If you're really committed to the "no difference between datacenter hardware engineering and claude code harness engineering, they must all use the same practices, anything true for one is true for the other" bit, fair enough. It seems fairly ideological to me.
It feels to me that you are really trying hard to brainstorm root causes for their failures.
Could be a hardware issue, could be a datacenter issue. Could be anything. So it could also be a software issue right?
This is not ideology. This is talking about root causes and I replied to someone that started saying this is not because of them promoting and using LLMs to the maximum. Could be it is not because of that. But it could be because of LLMs.
Keeping a company accountable when they try to sell a service that will replace engineers is not ideology. Ideology will be to not use any company that uses LLMs. But pointing out the disconnect between the public discourse and the status.claude.com is a simple idea.
Can you tell me that all those red lines there are infra?
But it is like that. You have zero insight into the infrastructure issue. And the person quoted above is a Claude Code developer. So because this guy uses Claude generously to build Claude Code, then Anthropic's API scaling issues must necessarily be caused by his agent loops even though scaling issues plague every tech company, no less often pre-AI.
The issue is that it's a thought-terminating cliche, and it would be nice to have one place on the internet that isn't just who can post one the fastest with the most glee to the giddy seal-clapping of the audience.
Engineering practices or best practices are much more than writing code.
So not sure what we are debating here: I see first hand companies jumping full on using LLM for _everything_ for the last 6 months (of course Anthropic longer) and without guardrails and good engineering practices the number of incidents, downtime is increasing.
Look at status.claude.com - Anthropic could at any point come out and say all those are due to third party providers.
I am also not saying here Anthropic is worse than other scaleups. But they do something different: they come in front of us and tell us they have better engineering practices.
> Anthropic could at any point come out and say all those are due to third party providers.
Why can't it be simply the case that Anthropic is struggling by their own accord? Infra scaling isn't a solved problem, much less with new, complicated, ever-changing, stateful LLM requests.
Pretty much every API-service-centric company I've worked at was in some constant state of either triaging or thinking about infrastructure health, often due to the familiar cascading problems of a necessarily distributed system.
But now with the AI scapegoat, we rewrite history to pretend us humans solved infra scaling, so any issues today must be caused by AI and any related superstitions we want to tack on.
> Why can't it be simply the case that Anthropic is struggling by their own accord?
They can and it is normal. I have said it is normal for scaleups specifically at a similarity growth rate.
What we (or at least I) critique here is coming out in the world and announcing that coding is done while having a product that has a status page full with red stripes. Yes, could be infrastructure, could be third party integrations could be a lot. But a lot of what is there is software. And yes, some parts is hardware. Unless the root cause is culture. In that case as I mentioned in another comment there I give them that: LLMs cannot solve culture.
Again the difference here is that the other scale-ups with similar _scaling_ issues are not talking about how we should all just use LLMs for everything and that learning to code is not required anymore.
So I am not saying the real issue is not infra or integration with third parties. What I am pointing at is: "don't talk that you don't need engineering while you - yourself - have engineering problems that need engineering solutions and still have not solve them".
Also you are getting out of your way to brainstorm possible root causes that will let them get away with this cognitive dissonance (or is there a better them in communication). Let them do the explanation and defend their position as they are the ones attacking the computer science engineering.
> Again the difference here is that the other scale-ups with similar _scaling_ issues are not talking about how we should all just use LLMs for everything and that learning to code is not required anymore.
> Let them do the explanation and defend their position as they are the ones attacking the computer science engineering.
This once again boils back down into: because they make claims about LLMs being good, I get to make any claim I want, and if if they didn't want me to make my claim, they shouldn't have made theirs.
It seems reactionary rather than earnest.
You've accused me and someone else of "brainstorming" reasons why they might have infra scaling issues, but I'm not. I'm pointing out that everyone has them especially pre-AI, and all of those reasons are on the table, not less likely. You have done the opposite: committed to a suspicion. That is the end result of the thought-terminating cliche.
Another data point: GitHub is extremely insistent its employees maximally use AI for internal development [0], and we’ve concomitantly seen its reliability fall off a cliff in the last year or so.
[0] https://github.com/resources/insights/ai-powered-workforce-p...
>GitHub is extremely insistent its employees maximally use AI for internal development
Or it could be that GitHub saw a 14x increase in commit volume last year[0], and we've concomitantly seen its reliability fall of a cliff in the last year or so. Given that Microsoft is leasing additional space on AWS(!)[1] to handle the additional commit volume, my personal money is on commit volume growth being a bigger issue than internal use of AI.
Internal use of AI may have been an issue. Commit volume growth may have been an issue. Unless one has direct knowledge of their infrastructure issues, claiming to know is quite literally making exactly the "they are vegan, their illness must be caused by their veganism" argument the GP commenter was talking about.
[0]https://daringfireball.net/linked/2026/05/04/commits-on-gith...
[1]https://www.businessinsider.com/microsoft-github-amazon-ai-c...
There’s a difference between having normal levels of difficulty and bad luck, and having people blame those on the wrong thing, vs having extraordinarily miserable quality and having people find the obvious difference. Potentially yes, they might have terrible wiring in their office or a crippling fondness for vim. But if I were their PR department I’d be talking about that if it was the problem.
If you go around bragging that you use AI for everything as part of your marketing plan, then don't be surprised that people blame you heavy AI usage when you have a problem.
Ahem...
"Vegans and vegetarians may have higher stroke risk" - https://www.bbc.com/news/health-49579820
"Vegans had a 43% higher risk of fractures overall compared to nonvegetarians, as well as higher risks of hip, leg, and vertebral fractures." - https://sniglobal.org/plant-based-diets-and-fracture-risk/
"The Impact of a Vegan Diet on Many Aspects of Health: The Overlooked Side of Veganism" - https://www.cureus.com/articles/138315-the-impact-of-a-vegan...
"..people who followed a vegan diet had noticeably low levels of iodine in their bodies, an element that is essential for growth, bones, and brain function. In addition, vegans had lower bone health scores..." - https://www.bfr.bund.de/en/press-release/vegan-vegetarian-be...
There are a lot of nutritional blind spots in vegan diets. It is a diet that requires exceptional planning and intentionality to be at a baseline of health similar to a balanced omnivorous diet.
So indeed, the "it must be veganism" is not an unfounded concern when health complications arise, in a very similar way to "it must be the AI" is a valid concern when software issues arise.
This isn't really the place for this, nor does it matter to my analogy.
But I was more getting at, say, staying out of the sun or being skinnyfat as a vegan, and suddenly you look "sickly"/"frail" when you'd be given the grace of looking like most people otherwise.
A similar analogy would be someone saying "well, of course you do" if you have any malady while having been vaccinated. My point being to bring up the thought terminating cliche of it compared to doing the necessary further analysis to link the malady with the suspected cause.
---
> "Vegans and vegetarians may have higher stroke risk"
It was a lump vegetarian + vegan group with a weak CI bounded at 1.02 for 3/1000 cases over a decade. The same group also had a more robust benefit of less heart disease than meat eaters. The stroke outcomes aren't replicated in other cohorts either, afaik. But the heart disease benefits are.
> "Vegans had a 43% higher risk of fractures overall compared to nonvegetarians, as well as higher risks of hip, leg, and vertebral fractures."
The study used a single baseline questionnaire for 17+ years and looked at vegans with correctable nutrition deficiencies to see +15/1000 hip fractures over 10 years. I'll grant that a poorly planned diet, especially 30 years ago with less nutritional understanding, has worse health outcomes. Just like I wouldn't use the average American's diet to lambast an omnivore diet (compared to, say, the "Mediterranean" diet).
> "vegans had lower iodine, bone health scores" (RBVD study)
On bones: p=0.02 in 72 people with 5% less QUS score in their heel bone (not DXA nor bone density tested). No body weight mediation nor data about health outcomes like fractures, osteoporosis, and no time dimension since it was just a snapshot (cross-sectional).
On iodine: It's a surrogate biomarker from a single pee test. Study didn't look at iodine-related health outcomes like thyroid dysfunction, goiter, or clinical consequences.
---
Protip: in the olden days we used to be able to read and write code ourselves. Worth trying while Claude is down! You might have fun and learn something!
Appstoreconnect too
Who is GLM 5.2?
I'll do you one better! Why is GLM 5.2?
Where is GLM 5.2?
Smells like someone's gassing Mythos back up.
Always good because people will look for and try alternatives.
Are you implying using their brains?
Imagine a future where Anthropic holds your company hostage because no one can code properly anymore by hand and demands paying 200% higher price for the usage.
What can your company do?
>> What can your company do?
Hire some Developers?
Developers who can code without LLMs will go extinct in couple years and there will be legends about them, you should at least have some decent open weight model as a backup
There is something to be said for how the technology stack keeps growing for businesses and what this might mean for the future.
Thirty years ago, you had an OS and you installed applications. No problem.
Later, you had to build and use apps on the internet, an infrastructure that is susceptible to DDOS attacks, government firewalls, and other security risks. Still fine, sort of.
Now, you not only have to build apps on the internet, you also have use LLMs to build apps to remain competitive with other developers. Future (human) maintainers of your code might not properly understand how it works, and if the providers of the LLMs screw up or go rogue, you are properly fucked.
There is a dependency/technology stack debt that is creating risks that need to be acknowledged.
I'm not sure if I'd want to code without an LLM anymore. That said, there will always be open models.
Wait a second kiddo, I expect to live longer than that.
I don't plan on using LLMs for programming any time soon.
And I know like one guy who does use them. He's not a developer by trade, he just has to write programs sometimes.
What exactly is a developer in a scenario where no one can code?
As long as I'm alive (and not senile) there will always be at least one developer who can code
I'm not using AI coding tools yet, and even if they force me at gunpoint to use them at work no one can force me to in my spare time
I'm not too worried about the case where no one can code anymore because that will be after I'm dead
I guess that means there will be at least two of us.
That wasn't the premise of the question.
My answer to your question is "I don't care, because I'll be dead"
Me?
No one can code. You can code.
Are you no one?
Only the Sith deal in absolutes
Sounds like an absolute...
I would guess that they would want to at the very least 10x their prices. Remember they need to make up for training, marketing, etc.. and make a big chunk of profit on top of that to justify their trillion dollar evaluation
doing 2x 3 times already gets you almost 10x increase
No need to make up speculative futures based on a company only giving one model to their employees. I use Codex, Antigravity, Claude and GLM-5.2 interchangeably. Any sensible employer will do the same.
> Any sensible employer will do the same.
Hard to do when each individual provider wants to lock your company into multiyear enterprise contracts.
You can still have multiple contracts.
the company will switch to a different LLM vendor??
what does someone do when a certain brand coffee maker keeps breaking; they buy a different brand.
Hire developers who will be happy to take merely 100% higher rates?
Use an Anthropic competitor?
Won't it eventually be $1,000 or $5,000 a month? $5k a month would still be 97% less than many developers cost.
How many developers are making 2 million a year?
400k isn’t crazy for the FANG set but it’s still a subset of the developer market and hundreds of thousands of those jobs have been cut in the last few years as they all collectively work to lower SWE pay.
60k a year it needs to be a full irreplaceable part of the infrastructure for I think. There are very few kinds of software that meet that bar right now (certain design tools etc that have no replacement). 12k/year is in the expensive but reasonable for the right tooling category (Matlab etc.).
I don’t know what the future holds. I know the big AI companies are banking on being able to charge for a replacement SWE that works 24/7. Still not convinced these are it yet, as useful as they can be under the right circumstances.
Another day, another Claude outage.
Incredible how we can claim productivity increases when its either Claude or Github shitting the bed every other day. It must even itself out to a net neutral gain in the long term.
I don't understand this comment. At worst, we're just back to the baseline - working without AI help.
Yes, that's what the comment means.
We are back to the baseline. The availability of our tools isn't adding anything in the long term because the productivity increase we get from the tooling is negated by the time we're back to doing it the old fashioned way due to downtime, so there is no claimed productivity increase espoused by the pontificators of the tooling.
This is an argument for returning to living in caves and hunting mammoths for fear that our modern civilization becomes unavailable for a day or two.
I'm down
The bunch of MD files in the codebase is becoming "tech" debt. It's just English prose, sure, but thousands of lines of English prose. Terse. Succinct. Difficult (if not impossible) to maintain manually without LLMs. That's not "baseline"
Developers having a troubled relationship with documentation isn't new.
At some point it won't be true. Same with handwriting, nowadays I feel like a 7 y/o when I need to write something on a piece of paper...
The baseline is forever gone. Good luck convincing people to contribute to StackOverflow v2 after this.
With atrophy to our not-AI ability to do things
I don't buy it. Literacy rates have been increasing even after the invention of text to speech.
> Literacy rates have been increasing
uh
https://news.harvard.edu/gazette/story/2025/09/whats-driving...
https://literacybuffalo.org/2025/01/23/adult-literacy-rates-...
Both articles use 2017 as the turning point date. TTS is a lot older than that. It's not difficult to find data to fit the desired point if you choose a narrow enough time range. Or location selectivity - both of those are just about the United States.
https://ourworldindata.org/grapher/cross-country-literacy-ra...
If that 0.07% downtime was holding me back I wouldn't publicly admit that.
When that downtime happens is way more important than the amount of it. Imagine if your payroll system was down for 8 hours a month, but it just so happened to be the day payroll do their calculations?
Totally. The uptime metrics are deceiving imo. A more useful measure for a productivity tool like Claude Code is uptime during work hours for a given time zone. I strongly suspect at least for the three US time zones, we would be looking at a single nine of uptime for that measure.
'Claude for Government' is the only one with 0.07% downtime, claude.ai has 0.89% downtime and claude code 0.74% - imo, that's a lot of downtime!
over a year, 0.89% is around 3 whole days of downtime
Works out at even more days when you consider working hours. And these downtime events never happen when I'm sleeping, always smack in the middle of the afternoon when I'm working.
Claude is 0.89% downtime. Getting close to one nine.
There aren't many tools that remain useful at that rate.
Business/Office productivity tools can be productive at that rate. Core systems like ERP or arguably CRM can't, but MS Teams is probably already that low, Figma, Canva and several others could absolutely afford to be one nine before it affects their churn materially. I suspect OpenAI and Anthropic make most of their profit on business use cases rather than dev use cases (likely higher revenue but less profit) so this may be what sets the standard of uptime.
Heh, I’m 5x more productive 99 percent of the time. That is still a very, very useful tool.
That's two nines. One nine would be 10% downtime.
So 95% uptime / 5% downtime is two nines?
Gotta keep my 100x developer cred, that 0.07% is everything.
Hey you. Touch grass. Go outside. If a minor downtime of a developer tool triggers you, it means you likely have heavy anxiety. Don’t worry about it and calm down.
Anthropic has massive capability issues due to massive user growth. It happens often when EU and US work hours collide. They have smart people working on it. Don’t waste your energy complaining.
Cheers
It’s 36°C outside, I’d rather stay inside.
>> It’s 36°C outside
Yeah AI Data Centers do that....
I really wish people wouldn't pretend these actually matter compared to, say, the proliferation of personal internal combustion cars, or shipping using bunker fuel.
Goddamit, like losing the ability for coding without any Internet wasn't enough, now I have to forget how to code without Claude?
ps. if you say you still capable of developing software without the Internet, you're lying. Perhaps, to your own self.
I'm perfectly capable of programming without internet or AI but I would admit it would take longer and in the modern world we live in it's often not economical to do so. After programming for over 20 years you start to get in that flow automatically at least you used to do so. I don't know if people starting out to program will be able to, but most experienced developers will feel this way I assume.
> I'm perfectly capable of programming without internet
Me too, but let's be honest, I'm not talking about "Hello world!" experiments, I'm talking about developing usable software. I'm pretty sure, you won't be patching a Linux kernel driver on your own machine without googling stuff.
I've learned to code years before the Internet, but we've had it for so long, I'm honestly not sure anymore if I'm truly capable of building [real] stuff while offline. And I can't just ignore it, there's a feeling now, that with AI advancements, I may soon no longer be able to code efficiently without any AI.
It depends on the language but agreed. If I didn't have internet or AI access, I'd still be able to pull out manpages or dig into source code.
I wouldn't like it and it'd be slower, but I still understand my environment in sufficient depth to work without external info if I absolutely have to. Even with AI, once in a while I ask it to just give me some hints instead of solving something for me, so I'm forced to do the work.
Utter nonsense. If you can't figure out how to run your dev stack on your own computer, you're not worthy of calling yourself a software engineer.
L take. My last 2 companies had their environments spun up in cloud instances.
"Running your dev stack" is not the same as "developing [usable] software".
Oh no, I have to write that marketing coordination email myself again!
I hear that 100% of code at Anthropic is coded by Claude, so this was caused by Claude. And also, no one but Claude can fix Claude
>> I hear that 100% of code at Anthropic is coded by Claude, so this was caused by Claude. And also, no one but Claude can fix Claude
Claude is down....
Claude is taking a hydration break.
Don't worry they've got a dude named Claude they keep in the back just for occasions like this
Claude is just a tool. Garbage in, garbage out, blame management
It would be hilarious if they don't know how to fix it because this was built by "running loops calling Claude" and they haven't the faintest idea of the present underlying architecture.
:)
Maybe they have DeepSeek subscription for that occasion?
I request an official statement from Anthropic explaining how they're going to limit outages in the future. Elevated errors almost always means its down for me and I can't be that unlucky statistically speaking. It seems that Anthropic does not have a good grip on the ops side of things.
I'm sure they'll get right on that for ya bud
Thanks for your contribution to this problem. Keen to see what you come up with next!