The blog post has a bunch of charts, which gives it a veneer of objectivity and rigor, but in reality it's just all vibes and conjecture. Meanwhile recent empirical studies actually point in the opposite direction, showing that AI use increases inequality, not decrease it.
Of course AI increases inequality. It's automated ladder pulling technology.
To become good at something you have to work through the lower rungs and acquire skill. AI does all those lower level jobs, puts the people who need those jobs for experience on the street, and robs us of future experts.
The people who benefit the most are those who are already up on top of the ladder investing billions to make the ladder raise faster and faster.
AI has been extremely useful at teaching me things. Granted I needed to already know how to learn and work through the math myself, but when I get stuck it is more helpful than any other resource on the internet.
> To become good at something you have to work through the lower rungs and acquire skill. AI does all those lower level jobs, puts the people who need those jobs for experience on the street, and robs us of future experts.
You can still do that with AI, you give yourself assignments and then use the AI as a resource when you get stuck. As you get better you ask the AI less and less. The fact that the AI is wrong sometimes is like test that allows you to evaluate if you are internalizing the skills or just trusting the AI.
If we ever have AIs which don't hallucinate, I'd want that added back in as a feature.
Honestly not sure it is easier to learn coding today than before. In theory maybe but in reality 99% of people will use AI as a crutch - half or learning is when you have to struggle a bit with something. If all the answers are always in front of you it will be harder to learn. I know it would be hard for me to learn if I could just ask for the code all the time.
I've been coding for decades already, but if I need to put something together in an unfamiliar language? I can just ask AI about any stupid noob mistake I make.
It knows every single stupid noob mistake, it knows every "how do I sort an array", and it explains well, with examples. Like StackOverflow on steroids.
The caveat is that you need to WANT to learn. If you don't, then not learning is easier than ever too.
> I've been coding for decades already, but if I need to put something together in an unfamiliar language? I can just ask AI about any stupid noob mistake I make.
So you aren’t still learning foundational concepts or how to think about problems, you are using it as a translation tool. Very different, in my opinion.
I agree with you - I learned to program because I found it fascinating, and wanted to know how my computer worked, not because it was the only option available to me at the time...
There are always people willing to take shortcuts at long-term expense. Frankly I'm fine with the selection pressure changing in our industry. Those who want to learn will still find a way to do it.
Yeah, the graphs make some really big assumptions that don't seem to be backed up anywhere except AI maximalist head canon.
There's also a gap in addressing vibe coded "side projects" that get deployed online as a business. Is the code base super large and complex? No. Is AI capable of taking input from a novice and making something "good enough" in this space? Also no.
The later remarks are very strong assumptions underestimating the power AI tools offer.
AI tools are great at unblocking and helping their users explore beyond their own understanding. The tokens in are limited to the users' comprehension, but the tokens out are generated from a vast collection of greater comprehension.
For the novice, it's great at unblocking and expanding capabilities. "Good enough" results from novices are tangible. There is no doubt the volume of "good enough" is perceived as very low by many.
For large and complex codebases, unfortunately the effects of tech debt (read: objectively subpar practices) translate into context rot at development time. A properly architected and documented codebase that adheres to common well structured patterns can easily be broken down into small easily digestible contexts. i.e. a fragmented codebase does not scale well with LLMs, because the fragmentation is seeding the context for the model. The model reflects and acts as an amplifier to what it's fed.
> For the novice, it's great at unblocking and expanding capabilities. "Good enough" results from novices are tangible. There is no doubt the volume of "good enough" is perceived as very low by many.
For personal tools or whatever, sure. And the tooling or infrastructure might get there for real projects eventually, but it’s not currently. The prospect of someone naively vibe coding a side business including a payment or authentication system or something that stores PII— all areas developers learn the dangers of through the wisdom gained only by experience— sends shivers down my spine. Even amateur coders trying that stuff try old fashioned way must read their code and the docs and info on the net and such and will likely get some sense of the danger. Yesterday I saw someone here recounting a disastrous data breach of their friend’s vibe coded side hustle.
The big problem I see here is people not knowing enough to realize that something functioning is almost never a sign that it is “good enough” for many things they might assume it is. Gaining the amount of base knowledge to evaluate things like form security nearly makes the idea of vibe coding useless for anything more than hobby or personal utility projects.
> For large and complex codebases, unfortunately the effects of tech debt (read: objectively subpar practices) translate into context rot at development time. A properly architected and documented codebase that adheres to common well structured patterns can easily be broken down into small easily digestible contexts. i.e. a fragmented codebase does not scale well with LLMs, because the fragmentation is seeding the context for the model. The model reflects and acts as an amplifier to what it's fed.
It seems like you're claiming complex codebases are hard for LLMs because of human skill issues. IME it's rather the opposite - an LLM makes it easier for a human to ramp up on what a messy codebase is actually doing, in a standard request/response model or in terms of looking at one call path (however messy) at a time. The models are well trained on such things and are much faster at deciphering what all the random branches and nested bits and pieces do.
But complex codebases actually usually arise because of changing business requirements, changing market conditions, and iteration on features and offerings. Execution quality of this varies but a "properly architected and documented codebase" is rare in any industry with (a) competitive pressure and (b) tolerance for occasional bugs. LLMs do not make the need to serve those varied business goals go away, nor do they remove the competitive pressure to move rapidly vs gardening your codebase.
And if you're working in an area with extreme quality requirements that have forced you into doing more internal maintenance and better codebase hygiene then you find yourself with very different problems with unleashing LLMs into that code. Most of your time was never spent writing new features anyway, and LLM-driven insight into rare or complex bugs, interactions, and performance still appears quite hit or miss. Sometimes it saves me a bunch of time. Sometimes it goes in entirely wrong directions. Asking it to make major changes, vs just investigate/explain things, has an even lower hit rate.
In a sense I agree. I don't necessarily think that it has to be the case, but I got that same feeling of that it was wearing a white lab coat to be a scientist. I think their honest attempt was to express the relationship of how they perceive things.
I think this could still be used as a valuable form of communication if you can clearly express the idea that this is representing a hypothesis rather than a measurement. The simplest would be to label the graphs as "hypothesis". but a subtle but easily identifiable visual change might be better.
Wavy lines for the axis spring to mind as an idea to express that. I would worry about the ability to express hypotheses about definitive events that happen when a value crosses an axis though, You'd probably want a straight line for that. Perhaps it would be sufficient to just have wavy lines at the ends of the axes beyond the point at which the plot appears.
Beyond that. I think the article presumes the flattening of the curve as mastery is achieved. I'm not sure that's a given, perhaps it seems that way because we evaluate proportional improvement, implicitly placing skill on a logarithmic scale.
I'd still consider the post from the author as being done in better faith than the economist links.
Id like to know what people think, and for them to say that honestly. If they have hard data, they show it and how it confirms their hypothesis. At the other end of the scale is gathering data and only exposing the measurements that imply a hypothesis that you are not brave enough to state explicitly.
Read my comment again. keyword here is "recent". The second link also expands on why it's relevant. It's best to read the whole article, but here's a paragraph that captures the argument:
>The shift in recent economic research supports his observation. Although early studies suggested that lower performers could benefit simply by copying AI outputs, newer studies look at more complex tasks, such as scientific research, running a business and investing money. In these contexts, high performers benefit far more than their lower-performing peers. In some cases, less productive workers see no improvement, or even lose ground.
All of the studies were done 2023-2024 and are not listed in order that they were conducted. The studies showing reduced equality all apply to uncommon tasks like material discovery and debate points, whereas the ones showing increased equality are broader and more commonly applicable, like writing, customer interaction, and coding.
>All of the studies were done 2023-2024 and are not listed in order that they were conducted
Right, the reason why I pointed out "recent" is that it's new evidence that people might not be aware of, given that there were also earlier studies showing AI had the opposite effect on inequality. The "recent" studies also had varied methodology compared to the earlier studies.
>The studies showing reduced equality all apply to uncommon tasks like material discovery and debate points
"Debating points" is uncommon? Maybe not everyone was in the high school debate club, but "debating points" is something that anyone in a leadership position does on a daily basis. You're also conveniently omitting "investment decisions" and "profits and revenue", which basically everyone is trying to optimize. You might be tempted to think "Coding efficiency" represents a high complexity task, but the abstract says the test involved "Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible". The same is true of the task used in the "legal analysis" study, which involved drafting contracts or complaints. This seems exactly like the type of cookie cutter tasks that the article describes would become like cashiers and have their wages stagnate. Meanwhile the studies with negative results were far more realistic and measured actual results. Otis et al 2023 measured profits and revenue of actual Kenyan SMBs. Roldan-Mones measured debate performance as judged by humans.
> Right, the reason why I pointed out "recent" is that it's new evidence that people might not be aware of, given that there were also earlier studies showing AI had the opposite effect on inequality.
Okay, well the majority of this "recent" evidence agrees with the pre-existing evidence that inequality is reduced.
> "Debating points" is uncommon?
Yes. That is nobody's job. Maybe every now and then you might need to come up with some arguments to support a position, but that's not what you get paid to do day to day.
> You're also conveniently omitting "investment decisions" and "profits and revenue", which basically everyone is trying to optimize.
Very few people are making investment decisions as part of their day to day job. Hedge funds may experience increasing inequality, but that kinda seems on brand.
On the other hand "profits and revenue" is not a task.
> You might be tempted to think "Coding efficiency" represents a high complexity task, but the abstract says the test involved "Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible". The same is true of the task used in the "legal analysis" study, which involved drafting contracts or complaints.
These sound like real tasks that a decent number of people have to do on a regular basis.
> Meanwhile the studies with negative results were far more realistic and measured actual results. Otis et al 2023 measured profits and revenue of actual Kenyan SMBs. Roldan-Mones measured debate performance as judged by humans.
These sound like niche activities that are not widely applicable.
Yup. As a retired mathematician who craves the productivity of an obsessed 28 year old, I've been all in on AI in 2025. I'm now on Claude's $200/month Max plan in order to use Claude Code Opus 4 without restraint. I still hit limits, usually when I run parallel sessions to review a 57 file legacy code base.
For a time I refused to talk with anybody or read anything about AI, because it was all noise that didn't match my hard-earned experience. Recently HN has included some fascinating takes. This isn't one.
I have the opinion that neurodivergents are more successful using AI. This is so easily dismissed as hollow blather, but I have a precise theory backing this opinion.
AI is a giant association engine. Linear encoding (the "King - Man + Woman = Queen" thing) is linear algebra. I taught linear algebra for decades.
As I explained to my optometrist today, if you're trying to balance a plate (define a hyperplane) with three fingers, it works better if your fingers are farther apart.
My whole life people have rolled their eyes when I categorize a situation using analogies that are too far flung for their tolerances.
Now I spend most of my time coding with AI, and it responds very well to my "fingers farther apart" far reaching analogies for what I'm trying to focus on. It's an association engine based on linear algebra, and I have an astounding knack for describing subspaces.
Would you sit on a stool with legs three inches apart?
For a statistician, determining a plane from three approximate points on the plane is far more accurate if the points aren't next to each other.
When we offer examples or associations in a prompt, we experience a similar effect in coaxing a response from AI. This is counter-intuitive.
I'm fully aware that most of what I post on HN is intended for each future AI training corpus. If what I have to say was already understood I wouldn't say it.
> Now I spend most of my time coding with AI, and it responds very well to my "fingers farther apart" far reaching analogies for what I'm trying to focus on.
If you made analogies based on Warhammer 40k or species of mosquitoes it would have reacted exactly the same.
Thanks for the links. That should be obvious to anyone who believes that $70 billion datacenters (Meta) are needed and the investment will be amortized by subscriptions (in the case of Meta also by enhanced user surveillance).
The means of production are in a small oligopoly, the rest will be redundant or exploitable sharecroppers.
(All this under the assumption that "AI" works, which its proponents affirm in public at least.)
This mirrors insights from Andrew Ng's recent AI startup talk [1].
I recall he mentions in this video that the new advice they are giving to founders is to throw away prototypes when they pivot instead of building onto a core foundation. This is because of the effects described in the article.
He also gives some provisional numbers (see the section "Rapid Prototyping and Engineering" and slides ~10:30) where he suggests prototype development sees a 10x boost compared to a 30-50% improvement for existing production codebases.
This feels vaguely analogous to the switch from "pets" to "livestock" when the industry switched from VMs to containers. Except, the new view is that your codebase is more like livestock and less like a pet. If true (and no doubt this will be a contentious topic to programmers who are excellent "pet" owners) then there may be some advantage in this new coding agent world to getting in on the ground floor and adopting practices that make LLMs productive.
IMO the problem with this pets vs. livestock analogy is that it focuses on the code when the value is really in the writers head. Their understanding and mental model of the code is what matters. AI tools can help with managing the code, helping the writer build their models and express their thoughts, but it has zero impact on where the true value is located.
Great point, but just mentioning (nitpicking?) that I never heard about machines/containers referred to as "livestock", but rather in my milieu it's always "pets" vs "cattle". I now wonder if it's a geographical thing.
Yeah, the CERN talk* [0] coined the term Pets vs. Cattle analogy, and it was way before VMs were cheap on bare metal. I think the word just evolved as the idea got rooted in the community.
We use the same analogy for the last 20 years or so. Provisioning 150 cattle servers take 15 minutes or so, and we can provision a pet in a couple of hours, at most.
*: Engine Yard post notes that Microsoft's Bill Baker used the term earlier, though CERN's date (2012) checks out with our effort timeline and how we got started.
There seems to be a pattern of humorous plurals in English where by analogy with ox ~ oxen you get -x ~ -xen: boxen, Unixen, VAXen.
Before you call this pattern silly, consider that the fairly normal plural “Unices” is by analogy with Latin plurals in -x = -c|s ~ -c|ēs, where I’ve expanded -x into -cs to make it clear that the Latin singular comprises a noun stem ending in -c- and a (nominative) singular ending -s, which does exist in Latin but is otherwise completely nonexistent in English. (This is extra funny for Unix < Unics < Multics.) Analogies are the order of the day in this language.
Thanks for pointing this out. I think this is an insightful analogy. We will likely manage generated code in the same way we manage large cloud computing complexes.
This probably does not apply to legacy code that has been in use for several years where the production deployment gives you a higher level of confidence (and a higher risk of regression errors with changes).
Have you blogged about your insights, the https://stillpointlab.com site is very sparse as is @stillpointlab
I'm currently in build mode. In some sense, my project is the most over complicated blog engine in the history of personal blog engines. I'm literally working on integrating a markdown editor to the project.
Once I have the MVP working, I will be working on publishing as a means to dogfood the tool. So, check back soon!
Mailing list is on the roadmap but doesn't exist just yet.
What you could do: sign in using one of the OAuth methods, go to the user page and then go to the feedback section. Let me know in a message your email and I'll ping you once the blog is setup.
Sorry it is primitive at this stage but I'm prioritizing MVP before marketing.
Oo, the "pets vs. livestock" analogy really works better than the "craftsmen vs. slop-slinger" arguments.
Because using an LLM doesn't mean you devalue well-crafted or understandable results. But it does indicate a significant shift in how you view the code itself. It is more about the emotional attachment to code vs. code as a means to an end.
I don't think it's exactly emotional attachment. It's the likelihood that I'm going to get an escalated support ticket caused by this particular piece of slop/artisanally-crafted functionality.
Not to slip too far into analogy, but that argument feels a bit like a horse-drawn carriage operator saying he can't wait to pick up all of the stranded car operators when their mechanical contraptions break down on the side of the road. But what happened instead was the creation of a brand new job: the mechanic.
I don't have a crystal ball and I can't predict the actual future. But I can see the list of potential futures and I can assign likelihoods to them. And among the potential futures is one where the need for humans to fix the problems created by poor AI coding agents dwindles as the industry completely reshapes itself.
Both can be true. There were probably a significant number of stranded motorists that were rescued by horse-powered conveyance. And eventually cars got more convenient and reliable.
I just wouldn't want to be responsible for servicing a guarantee about the reliability of early cars.
And I'll feel no sense of vindication if I do get that support case. I will probably just sigh and feel a little more tired.
Yes, the whole point that it is true. But only for a short window.
So consider differing perspectives. Like a teenage kid that is hanging around the stables, listening to the veteran coachmen laugh about the new loud, smoky machines. Proudly declaring how they'll be the ones mopping up the mess, picking up the stragglers, cashing it in.
The career advice you give to the kid may be different than the advice you'd give to the coachman. That is the context of my post: Andrew Ng isn't giving you advice, he is giving advice to people at the AI school who hope to be the founders of tomorrow.
And you are probably mistaken if you think the solution to the problems that arise due to LLMs will result in those kids looking at the past. Just like the ultimate solution to car reliability wasn't a return to horses but rather the invention of mechanics, the solution to problems caused by AI may not be the return to some software engineering past that the old veterans still hold dear.
I don't know about what's economically viable, but I like writing code. It might go away or diminish as a viable profession, which might make me a little sad. There are still horse enthusiasts who do horse things for fun.
Things change, and that's ok. I guess I just got lucky so far that this thing I like doing just so happens to align with a valuable skill.
I'm not arguing for or against anything, but I'll miss it if it goes away.
In my world that isn't inherently a bad thing. Granted, I belong to the YAGNI crowd of software engineers who put business before tech architecture. I should probably mention that I don't think this means you should skip on safety and quality where necessariy, but I do preach that the point of software is to serve the business as fast as possible. I do this to the extend where I actually think that our BI people who are most certainly not capable programmers are good at building programs. They mostly need oversight on external dependencies, but it's actually amazing what they can produce in a very short amount of time.
Obviously their software sucks, and eventually parts of it always escalates into a support ticket which reaches my colleagues and me. It's almost always some form of performance issue, this is in part because we have monthly sessions where they can bring issues they simply can't get to work to us. Anyway, I see that as a good thing. It means their software is serving the business and now we need to deal with the issues to make it work even better. Sometimes that is because their code is shit, most times it's because they've reached an actual bottleneck and we need to replace part of their Python with a C/Zig library.
The important part of this is that many of these bottlenecks appear in areas that many software enginering teams that I have known wouldn't necessarily have predicted. Mean while a lot of the areas that traditional "best practices" call for better software architecture for, work fine for entire software lifecycles being absolutely horrible AI slop.
I think that is where the emotional attachment is meant to fit in. Being fine with all the slop that never actually matters during a piece of softwares lifecycle.
There are some things that you still can't do with LLMs. For example, if you tried to learn chess by having the LLM play against you, you'd quickly find that it isn't able to track a series of moves for very long (usually 5-10 turns; the longest I've seen it last was 18) before it starts making illegal choices. It also generally accepts invalid moves from your side, so you'll never be corrected if you're wrong about how to use a certain piece.
Because it can't actually model these complex problems, it really requires awareness from the user regarding what questions should and shouldn't be asked. An LLM can probably tell you how a knight moves, or how to respond to the London System. It probably can't play a full game of chess with you, and will virtually never be able to advise you on the best move given the state of the board. It probably can give you information about big companies that are well-covered in its training data. It probably can't give you good information about most sub-$1b public companies. But, if you ask, it will give a confident answer.
They're a minefield for most people and use cases, because people aren't aware of how wrong they can be, and the errors take effort and knowledge to notice. It's like walking on a glacier and hoping your next step doesn't plunge through the snow and into a deep, hidden crevasse.
LLMs playing chess isn't a big deal. You can train a model on chess games and it will play at a decent ELO and very rarely make illegal moves(i.e 99.8% legal move rate). There are a few such models around. I think post training messes with chess ability and Open ai et al just don't really care about that. But LLMs can play chess just fine.
Jeez, that arxiv paper invalidates my assumption that it can't model the game. Great read. Thank you for sharing.
Insane that the model actually does seem to internalize a representation of the state of the board -- rather than just hitting training data with similar move sequences.
...Makes me wish I could get back into a research lab. Been a while since I've stuck to reading a whole paper out of legitimate interest.
(Edit) At the same time, it's still worth noting the accuracy errors and the potential for illegal moves. That's still enough to prevent LLMs from being applied to problem domains with severe consequences, like banking, security, medicine, law, etc.
> people aren't aware of how wrong they can be, and the errors take effort and knowledge to notice.
I have friends who are highly educated professionals (PhDs, MDs) who just assume that AI\LLMs make no mistakes.
They were shocked that it's possible for hallucinations to occur. I wonder if there's a halo effect where the perfect grammar, structure, and confidence of LLM output causes some users to assume expertise?
Computers are always touted as deterministic machines. You can't argue with a compiler, or Excel's formula editor.
AI, in all its glory, is seen as an extension of that. A deterministic thing which is meticulously crafted to provide an undisputed truth, and it can't make mistakes because computers are deterministic machines.
The idea of LLMs being networks with weights plus some randomness is both a vague and too complicated abstraction for most people. Also, companies tend to say this part very quietly, so when people read the fine print, they get shocked.
> I wonder if there's a halo effect where the perfect grammar, structure, and confidence of LLM output causes some users to assume expertise?
I think it's just that LLMs are modeling generative probability distributions of sequences of tokens so well that what they actually are nearly infallible at is producing convincing results. Often times the correct result is the most convincing, but other times what seems most convincing to an LLM just happens to also be most convincing to a human regardless of correctness.
> In computer science, the ELIZA effect is a tendency to project human traits — such as experience, semantic comprehension or empathy — onto rudimentary computer programs having a textual interface. ELIZA was a symbolic AI chatbot developed in 1966 by Joseph Weizenbaum and imitating a psychotherapist. Many early users were convinced of ELIZA's intelligence and understanding, despite its basic text-processing approach and the explanations of its limitations.
Its complete bullshit. There is no way anyone ever thought anything was going on in ELIZA. There were people amazed that "someone could program that" but they had no illusions about what it was, it was obvious after 3 responses.
Don't be so sure. It was 1966, and even at a university, few people had any idea what a computer was capable of. Fast forward to 2025...and actually, few people have any idea what a computer is capable of.
If I wasn't familiar with the latest in computer tech, I would also assume LLMs never make mistakes, after hearing such excited praise for them over the last 3 years.
It is only in the last century or so, that statistical methods were invented and applied. It is possible for many people to be very competent at what they are doing and at the same time be totally ignorant of statistics.
There are lies, statistics and goddamn hallucinations.
My experience, speaking over a scale of decades, is that most people, even very smart and well-educated ones, don't know a damn thing about how computers work and aren't interested in learning. What we're seeing now is just one unfortunate consequence of that.
(To be fair, in many cases, I'm not terribly interested in learning the details of their field.)
Have they never used it? Majority of the responses that I can verify are wrong. Sometimes outright nonse, sometimes believable. Be it general knowledge or something where deeper expertise is required.
I worry that the way the models "Speak" to users, will cause users to drop their 'filters' about what to trust and not trust.
We are barely talking modern media literacy, and now we have machines that talk like 'trusted' face to face humans, and can be "tuned" to suggest specific products or use any specific tone the owner/operator of the system wants.
It's super obvious even if you try and use something like agent mode for coding, it starts off well but drifts off more and more. I've even had it try and do totally irrelevant things like indent some code using various Claude models.
My favourite example is something that happens quite often even with Opus, where I ask it to change a piece of code, and it does. Then I ask it to write a test for that code, it dutifully writes one. Next, I tell it to run the test, and of course, the test fails. I ask it to fix the test, it tries, but the test fails again. We repeat this dance a couple of times, and then it seemingly forgets the original request entirely. It decides, "Oh, this test is failing because of that new code you added earlier. Let me fix that by removing the new code." Naturally, now the functionality is gone, so it confidently concludes, "Hey, since that feature isn't there anymore, let me remove the test too!"
> you'd quickly find that it isn't able to track a series of moves for very long (usually 5-10 turns; the longest I've seen it last was 18)
In chess, previous moves are irrelevant, and LLM aren't good with filtering out irrelevant data [1]. For better performance, you should include only the relevant data in the context window: the current state of then board.
Yeah, the chess example is interesting. The best specialised AIs for chess are all clearly better than humans, but our best general AIs are barely able to play legal moves. The ceiling for AI is clearly much higher than current LLMs.
Since agents are good only at greenfield projects, the logical conclusion is that existing codebases have to be prepared such that new features are (opinionated) greenfield projects - let all the wiring dangle out of the wall so the intern just has to plug in the appliance. All the rest has to be done by humans, or the intern will rip open the wall to hang a picture.
Hogwash. If you can't figure out how to do something with project Y from npm try checking it out from Github with WebStorm and asking Junie how to do it -- often you get a good answer right away. If not you can ask questions that can help you understand the code base. Don't understand some data structure which is a maze of Map<String, Objects>(s) it will scan how it is used and give you draft documentation.
Sure you can't point it to a Jira ticket and get a PR but you certainly can use it as a pair programmer. I wouldn't say it is much faster than working alone but I end up writing more tests and arguing with it over error handling means I do a better job in the end.
> Sure you can't point it to a Jira ticket and get a PR
You absolutely can. This is exactly what SWE-Bench[0] measures, and I've been amazed at how quickly AIs have been climbing those ladders. I personally have been using Warp [1] a lot recently and in quite a lot of low-medium difficulty cases it can one-shot a decent PR. For most of my work I still find that I need to pair with it to get sufficiently good results (and that's why I still prefer it to something cloud-based like Codex [2], but otherwise it's quite good too), and I expect the situation to flip over the coming couple of years.
I've not used it for long enough yet for this to be a strong opinion, but so far I'd say that it is indeed a bit better than Claude Code, as per the results on Terminal Bench[0]. And on a side note, I quite like the fact that I can type shell commands and chat commands interchangeably into the same input and it just knows whether to run it or respond to it (accidentally forgetting the leading exclamation mark has been a recurring mistake for me in Claude Code).
I think agents have a curve where they're kinda bad at bootstrapping a project, very good if used in a small-to-medium-sized existing project and then it goes downhill from there as size increases, slowly.
Something about a brand-new project often makes LLMs drop to "example grade" code, the kind you'd never put in production. (An example: claude implemented per-task file logging in my prototype project by pushing to an array of log lines, serializing the entire thing to JSON and rewriting the entire file, for every logged event)
An interloper being someone who intrudes or meddles in a situation (inter "between or amid) + loper "to leap or run" - https://en.wiktionary.org/wiki/loper ), an extraloper would be someone who dances or leaps around the outside of a subject or meeting with similar annoyances.
"inter-" means between, "intra-" means within, "extra-" means outside. "intra-" and "inter" aren't quite synonyms but they definitely aren't opposites of eachother.
Inter- implies relationships between entities, intra- implies relationships within entities.
In any single sentence context they cannot refer to the same relationships, and that which they are not is precisely the domain of the other word: they are true antonyms.
> and that which they are not is precisely the domain of the other word
External relationships are a thing, which are in neither of those domains.
An intramural activity is not the same thing as an intermural activity, but the overwhelming majority of activities which are not intramural activities are not intermural activities either, most are extramural activities.
The learning-with-AI curve should cross back under the learning-without-AI curve at towards the higher end even without "cheating".
The very highest levels of mastery can only come from slow, careful, self-directed learning that someone in a hurry to speedrun the process isn't focusing on.
> This means cheaters will plateau at whatever level the AI can provide
From my experience, the skill of using AI effectively is of treating the AI with a "growth mindset" rather than a "fixed" one. What I do is that I roleplay as the AI's manager, giving it a task, and as long as I know enough to tell whether its output is "good enough", I can lend it some of my metagcognition via prompting to get it to continue working through obstacles until I'm happy with the result.
There are diminishing returns of course, but I found that I can get significantly better quality output than what it gave me initially without having to learn the "how" of the skill myself (i.e. I'm still "cheating"), and only focusing my learning on the boundary of what is hard about the task. By doing this, I feel that over time I become a better manager in that domain, without having to spend the amount of effort to become a practitioner myself.
How do you know it’s significantly better quality if you don’t know any of the “how”? The quality increase seems relative to the garbage you start with. I guess as long as you impress yourself with the result it doesn’t matter if it’s not actually higher quality.
I don't think "quality" has anything like a universal definition, and when people say that they probably mean an alignment with personal taste.
Does it solve the problem? As long as it isn't prohibitively costly in terms of time or resources, then the rest is really just taste. As a user I have no interest whatsoever if your code is "idiomatic" or "modular" or "functional". In other industries "quality" usually means free of defects, but software is unique in that we just expect products to be defective. Your surgeon operates on the wrong knee? The board could revoke the license, and you are getting a windfall settlement. A bridge design fails? Someone is getting sued or even prosecuted. SharePoint gets breached? Well, that's just one of those things, I guess. I'm not really bothered that AI is peeing in the pool that has been a sewer as long as I can remember. At least the AI doesn't bill at an attorney's rate to write a mess that barely works.
This is exactly how I've been seeing it. If you're deeply knowledgable in a particular domain like lets say compiler optimization I'm unsure if LLM's will increase your capabilities (your ceiling), however, if you're working in a new domain LLMs are pretty good at helping you get oriented and thus raising the floor.
The greatest use of LLMs is the ability to get accurate answers to queries in a normalized format without having to wade through UI distraction like ads and social media.
It's the opposite of finding an answer on reddit, insta, tvtropes.
I can't wait for the first distraction free OS that is a thinking and imagination helper and not a consumption device where I have to block urls on my router so my kids don't get sucked into a skinners box.
I love being able to get answers from documentation and work questions without having to wade through some arbitrary UI bs a designer has implemented in adhoc fashion.
I don't find the "AI" answers all that accurate, and in some cases they are bordering on a liability even if way down below all the "AI" slop it says "AI responses may include mistakes".
>It's the opposite of finding an answer on reddit, insta, tvtropes.
Yeah it really is because I can tell when someone doesn't know the topic well on reddit, or other forums, but usually someone does and the answer is there. Unfortunately the "AI" was trained on all of this, and the "AI" is just as likely to spit out the wrong answer as the correct one. That is not an improvement on anything.
> wade through UI distraction like ads and social media
Oh, so you think "AI" is going to be free and clear forever? Enjoy it while it lasts, because these "AI" companies are in way over their heads, they are bleeding money like their aorta is a fire hose, and there will be plenty of ads and social whatever coming to brighten your day soon enough. The free ride won't go on forever - think of it as a "loss leader" to get you hooked.
I agree with the whole first half, but I disagree that LLM usage is doomed to ad-filled shittyness. AI companies may be hemmoraging money, but that's because their product costs so much to run; it's not like they don't have revenue. The thing that will bring profitability isn't ads, it will be innovations that let current-gen-quality LLMs run at a fraction of the electricity and power cost.
Will some LLMs have ads? Sure, especially at a free tier. But I bet the option to pay $20/month for ad-free LLM usage will always be there.
Silicon will improve, but not fast enough to calm investors. And better silicon won't change the fact that the current zeitgeist is basically a word guessing game.
$20 month won't get you much, if you're paying above what it costs to run the "AI", and for what? Answers that are in the ballpark of suspicious and untrustworthy?
Maybe they just need to keep spending until all the people who can tell slop from actual knowledge are all dead and gone.
I suppose it's all a matter of what one is using an LLM for, no?
GPT is great at citing sources for most of my requests -- even if not always prompted to do so. So, in a way, I kind of use LLMs as a search engine/Wikipedia hybrid (used to follow links on Wiki a lot too). I ask it what I want, ask for sources if none are provided, and just follow the sources to verify information. I just prefer the natural language interface over search engines. Plus, results are not cluttered with SEO ads and clickbait rubbish.
Hmm I don't feel like this should be taken as a tenet of AI. I feel a more relevant kernel would be less black and white.
Also I think what you're saying is a direct contradiction of the parent. Below average people can now get average results; in other words: The LLM will boost your capabilities (at least if you're already 'less' capable than average). This is a huge benefit if you are in that camp.
But for other cases too, all you need to know is where your knowledge ends, and that you can't just blindly accept what the AI responds with.
In fact, I find LLMs are often most useful precisely when you don’t know the answer. When you’re trying to fill in conceptual gaps and explore an idea.
Even say during code generation, where you might not fully grasp what’s produced, you can treat the model like pair programming and ask it follow-up questions and dig into what each part does. They're very good at converting "nebulous concept description" into "legitimate standard keyword" so that you can go and find out about said concept that you're unfamiliar with.
Realistically the only time I feel I know more than the LLM is when I am working on something that I am explicitly an expert in, and in which case often find that LLMs provide nuance lacking suggestions that don’t always add much. It takes a lot more filling in context in these situations for it to be beneficial (but still can be).
Take a random example of nifty bit of engineering: The powerline ethernet adapter. A curious person might encounter these and wonder how they work. I don't believe an understanding of this technology is very obvious to a layman. Start asking questions and you very quickly come to understand how it embeds bits in the very same signal that transmits power through your house without any interference between the two "types" of signal. It adds data to high frequencies on one end, and filters out the regular power transmitting frequencies at the other end so that the signal can be converted back into bits for use in the ethernet cable (for a super brief summary). But if want to really drill into each and every engineering concept, all I need to do is continue the conversation.
I personally find this loop to be unlike anything I've experienced as far as getting immediate access to an understanding and supplementary material for the exact thing Im wondering about.
Above average people can also use it to get average results. Which can actually be useful. For many tasks and usecases, the good enough threshold can actually be quite low.
I think a good way to see it is "AI is good for prototyping. AI is not good for engineering"
To clarify, I mean that the AI tools can help you get things done really fast but they lack both breadth and depth. You can move fast with them to generate proofs of concept (even around subproblems to large problems), but without breadth they lack the big picture context and without depth they lack the insights that any greybeard (master) has. On the other hand, the "engineering" side is so much more than "things work". It is about everything working in the right way, handling edge cases, being cognizant of context, creating failure modes, and all these other things. You could be the best programmer in the world, but that wouldn't mean you're even a good engineer (in real world these are coupled as skills learned simultaneously. You could be a perfect leetcoder and not helpful on an actual team, but these skills correlate).
The thing is, there will never be a magic button that a manager can press to engineer a product. The thing is, for a graybeard most of the time isn't spent around implementation, but design. The thing is, to get to mastery you need experience, and that experience requires understanding of nuanced things. Things that are non-obvious. There may be a magic button that allows an engineer to generate all the code for codebase, but that doesn't replace engineers. (I think this is also a problem in how we've been designing AI code generators. It's as if they're designed for management to magically generate features. The same thing they wish they could do with their engineers. But I think the better tool would be to focus on making a code generator that would generate based on an engineer's description.
I think Dijkstra's comments apply today just as much as they did then[0]
I was reading some stuff by Michael A. Jackson (Problem Frames Approach) and T.S.E Maibaum (Mathematical Foundations on Software Engineering) because I also had the impression that too much talk around LLM-assisted programming focuses on program text and annotations / documentation. Thinkers like Donald Schön thought about tacit knowledge-in-action and approached this with design philosophy. when looking at LLM-assisted programming, I call this shaded context.
as you say, software engineering is not only constructing program texts, its not even only applied math or overly scientific. at least that is my stance. I suspect AI code editors have lots of said tacit knowledge baked in (via the black box itself or its engineers) but we would be better off thinking about this explicitly.
> I suspect AI code editors have lots of said tacit knowledge baked in (via the black box itself or its engineers) but we would be better off thinking about this explicitly.
Until the AI is actually AGI I suspect it'll be better for us to do it. After all, if you don't do the design then you probably don't understand the design. Those details will kill you
In things that I am comparatively good at (e.g., coding), I can see that it helps 'raise the ceiling' as a result of allowing me to complete more of the low level tasks more effectively. But it is true as well that it hasn't raised my personal bar in capability, as far as I can measure.
When it comes to things I am not good at at, it has given me the illusion of getting 'up to speed' faster. Perhaps that's a personal ceiling raise?
I think a lot of these upskilling utilities will come down to delivery format. If you use a chat that gives you answers, don't expect to get better at that topic. If you use a tool that forces you to come up with answers yourself and get personalized validation, you might find yourself leveling up.
> When it comes to things I am not good at at, it has given me the illusion of getting 'up to speed' faster. Perhaps that's a personal ceiling raise?
Disagree. It's only the illusion of a personal ceiling raise.
---
Example 1:
Alice has a simple basic text only blog. She wants to update the styles on his website, but wants to keep his previous posts.
She does research to learn how to update a page's styles to something more "modern". She updates the homepage, post page, about page. She doesn't know how to update the login page without breaking it because it uses different elements she hasn't seen before.
She does research to learn what the new form elements and on the way sees recommendations on how to build login systems. She builds some test pages to learn how to restyle forms and while she's at it, also learns how to build login systems.
She redesigns her login page.
Alice believes she has raised the ceiling what she can accomplish.
Alice is correct.
---
Example 2:
Bob has a simple basic text only blog. He wants to update the styles on his website, but wants to keep his previous posts.
He asks the LLM to help him update styles to something more "modern". He updates the homepage, post page, about page, and login page.
The login page doesn't work anymore.
Bob asks the LLM to fix it and after some back and forth it works again.
Bob believes she has raised the ceiling what he can accomplish.
Bob is incorrect. He has not increased his own knowledge or abilities.
A week later his posts are gone.
---
There are only a few differences between both examples:
1. Alice does not use LLMs, but Bob does.
2. Alice knows how to redesign pages, but Bob does not.
3. Alice knows how login systems work, but Bob does not.
Bob simply asked the LLM to redesign the login page, and it did.
When the page broke, he checked that he was definitely using the right username and password but it still wasn't working. He asked the LLM to change the login page to always work with his username and password. The LLM produced a login form that now always accepted a hard coded username and password. The hardcoded check was taking place on the client where the username and password were now publicly viewable.
Bob didn't ask the LLM to make the form secure, he didn't even know that he had to ask. He didn't know what any of the footguns to avoid were because he didn't even know there were any footguns to avoid in the first place.
Both Alice and Bob started from the same place. They both lacked knowledge on how login systems should be built. That knowledge was known because it is documented somewhere, but it was unknown to them. It is a "known unknown".
When Alice learned how to style form elements, she also read links on how forms work which lead her to links on how login systems work. That knowledge for her went from an unknown known to a "known known" (knowledge that is known, that she now also knows).
When Bob asked the LLM to redesign his login page, at no point in time does the knowledge of how login systems work become a "known known" for him. And a week later some bored kid finds the page, right clicks on the form, clicks inspect and sees a username and password to log in with.
Most non-trivial expertise topics are not one-dimensional. You might be at the "ceiling" in some particular sub-niche, while still on the floor on other aspects of the topic.
So even if you follow the artcles premise (I do not), it still can potentially 'raise' you wherever you were.
Key seems to be wether you have enough expertise to evaluate or test the outputs. Some others have refered to this as having a good sense of the "known/unknown" matrix for the domain.
The AI will be most helpful for you in the known-unknown / unknown-known axis, not so much in the known-known / unknown-unknown parts. The latter unfortunatly is were you see the most derailed use of the tech.
AI raises everything - the ceiling is just being more productive. Productivity comes from adequacy and potency of tools. We got a hell of a strong tool in our hands, therefore, the more adequate the usage, the higher the leverage.
Surprised to see this downvoted. It feels true to me. Sure there are definitely novel areas where folks might not benefit but I can see a future where this tool becomes helpful for the vast majority of roles.
AI is going to cause a regression to the most anodyne output across many industries. As humans who had to develop analytical skills, writing skills, etc., we struggle to imagine the undeveloped brains of those who come of age in the zero-intellectual-gravity world of AI. OpenAI's study mode is at best a fig leaf.
edit: this comment was posted tongue-in-cheek after my comment reflecting my actual opinion was downvoted with no rebuttals:
I would say the modern digital world itself has already had the bigger impact on human thinking, at least at work.
It seems with computers we often think and reason far less than without. Everything required thought previously, now we can just copy and paste out word docs for everything. PowerPoints are how key decisions are communicated in most professional settings.
Before modern computers and especially the internet we also had more time for deep thinking and reasoning. The sublimity of deep thought in older books amazes me and it feels like modern authors are just slightly less deep on average.
So then LLMs are in my view an incremental change rather than a stepwise change with respect to its effects on human cognitive.
In some ways LLMs allow us to return a bit to more humanistic deep thinking. Instead of spending hours looking up minutia on Google, StackOverflow, etc now we can ask our favorite LLM instead. It gives us responses with far less noise.
Unlike with textbooks we can have dialogues and have it take different perspectives. Whereas textbooks only gave you that authors perspective.
Of course, it’s up to individuals to use it well and as a tool to sharpen thinking rather than replace it.
I'm not sure this is my experience so far. What I'm noticing is that my awesome developers have embraced AI as an accelerator, particularly for research and system design, small targeted coding activities with guardrails. My below average developers are having difficulty integrating AI at all in their workflow. If this trend continues the chasm between great and mediocre devs will widen dramatically.
At least the last coding-with-AI chart is still too optimistic, I think. It doesn't reflect how AI coding tools are making developers less productive (instead of more) in non-trivial projects.
I wanted to know how to clone a single folder in a Git repository. Having done this before, I knew that there was some incantation I needed to make to the git CLI to do it, but I couldn't remember what it was.
I'm very anti-AI for a number of reasons, but I've been trying to use it here and there to give it the benefit of the doubt and avoid becoming a _complete_ dinosaur. (I was very anti-vim ages ago when I learned emacs; I spent two weeks with vim and never looked back. I apply this philosophy to almost everything as a result.)
I asked Qwen3-235B (reasoning) via Kagi Assistant how I could do this. It gave me a long block of text back that told me to do the thing I didn't want it to do: mkdir a directory, clone into it, move the directory I wanted into the root of the directory, delete everything else.
When I asked it if it was possible to do this without creating the directory, it, incorrectly, told me that it was not. It used RAG-retrieved content in its chain of thought, for what that's worth.
It took me only 30 seconds or so to find the answer I wanted on StackOverflow. It was the second most popular answer in the thread. (git clone --filter=tree: --depth=0, then git sparse-checkout set --no-cone $FOLDER, found here: https://stackoverflow.com/a/52269934)
I nudged the Assistant a smidge more by asking it if there was a subcommand I could use instead. It, then, suggested "sparse-checkout init", which, according to the man page for this subcommand, is deprecated in favor of "set". (I went to the man page to understand what the "cone" method was and stumbled on that tidbit.)
THIS is the thing that disappoints me so much about LLMs being heralded as the next generation of search. Search engines give you many, many sources to guide you to the correct answer if you're willing to do the work. LLM services tell you what the "answer" is, even if it's wrong. You get potential misinformation back while also turning your brain off and learning less; a classic lose-lose.
ChatGPT, Gemini, and Claude all point me to a plethora of sources for most of my questions that I can click through and read. They're also pretty good at both basic and weird git issues for me. Not perfect, but pretty good.
Also part of the workflow of using AI is accepting that your initial prompts might not get the right answer. It's important to scan the answer like you did and use intuition to know 'this isn't right', then try again. Just like we learned how to type in good search queries, we'll also learn how to write good prompts. Sands will shift frequently at first, with prompt strategies that worked well yesterday requiring a different strategy tomorrow, but eventually it will stabilize like search query strategies did.
_I_ know that the answer provided by the prompt isn't quite right because I have enough experience with the Git CLI to know that it wasn't quite right.
Someone who doesn't use the Git CLI at all and is relying on an LLM to do it will not know that. There's also no reason for them to search beyond the LLM or use the LLM to go deeper because the answer is "good enough."
That's the point of what I'm trying to make. You don't know what you don't know.
Trying different paths that might go down dead ends is part of the learning process. LLMs short-circuit that. This is fine if you think that learning isn't valuable in, this case, software development. I think it is.
<soapbox>
More specifically, I think that this will, in the long term, create a pyramidal economy where engineers like you and I who learned "the old way" will reap most of the rewards while everyone else coming into the industry will fight for scraps.
I suppose this is fine if you think that this is just the natural order of things. I do not.
Tech is one of, if not the only, career path(s) that could give almost anyone a very high quality of life (at least in the US) without gatekeeping people behind the school they attend (i.e. being born into the right social strata, basically), many years of additional education and even more years of grinding like law, medicine and consulting do.
I'm very saddened to see this going away while us in the old guard cheer its destruction (because our jobs will probably be safe regardless).
</soapbox>
I also disagree with the claim that the LLM gives you a "plethora" of sources. The search I used gave me three [^0]. A search on the same topic gave me more than 15. [^1]
Yes, the 15 it gives me are all over the quality map, but I have much information at my disposal to find the answer I'm looking for. It also doesn't proport to be "the answer," like LLMs tend to do.
I wonder: the graphs treat learning with and without AI as two different paths. But obviously people can switch between learning methods or abandon one of them.
Then again, I wonder how many people go from learning about a topic using LLMs to then leaving them behind to continue the old school way. I think the early spoils of LLM usage could poison your motivation to engage with the topic on your own later on.
I learn about different subjects mixing traditional resources and AI.
I can watch a video about the subject, when I want to go deeper, I go to LLMs, throw a bunch of questions at it, because thanks to the videos I now know what to ask. Then the AI responses tell me what I need to understand deeper, so I pick a book that addresses those subjects. Then as I read the book and I don’t understand something, or I have some questions that I want the answer for immediately, I consult ChatGPT (or any other tool I want to try). At different points in the journey, I find something I could build myself to deepen my understanding. I google open source implementations, read them, ask LLMs again, I watch summary videos, and work my way through the problem.
LLMs serve as a “much better StackOverflow / Google”.
I use a similar approach. I tried to experiment going into a topic with no knowledge and it kinda fumbles, I highly recommend to have an overview.
But once you know basics, LLMs are really good to deepen the knowledge, but using only them is quite challenging. But as a complementary tool I find them excellent.
AI will be both a floor and a ceiling raiser, since there is a practical limit to how many domains one person or team can be expert in, and AI does/will have very strong levels of expertise/competency across a large number of domains and will thus offer significant level-ups in areas where cross-domain synthesis is crucial or where the limits of human working memory and pattern recognition make cross-domain synthesis unlikely to occur.
AI also enables much more efficient early stage idea validation, the point at which ideas/projects are the least anchored in established theory/technique. Thus AI will be a great aid in idea generation and early stage refinement, which is where most novel approaches stall or sit on a shelf as a hobby project because the progenitor doesn't have enough spare time to work through it.
Wouldn't it be both by this definition? It raises the bar for people who maybe have a lower IQ ("mastery"), but people who can us AI can then do more than ever before, raising the ceiling as well.
The blog still assumes that AI does not affect Mastery. I think it does.
All the AI junk like Agents in Service centers that you need to outplay in order to get in touch with a human, us as consumers are accepting this new status quo.
We will accept products that sometimes can do crazy stuff because of hallucinations. Why? Ultimate capitalism consumerism sheepism, some other ism.
So AI (and whether it is correlarion or causation, i dont know) also corresponds with lower level of Mastery
Oh man i love this take. It's how I've been selling what I do when I speak with a specific segment of my audience: "My goal isn't to make the best realtors better, it's to make the worst realtors acceptable".
And my client is often the brokerage, they just want their agents to produce commissions so they make a cut. They know their top producers probably wont get much from what I offer, but we all see that their worst performers could easily double their business.
People should be worried because right now AI is on an exponential growth trajectory and no-one knows when it will level off into an s-curve. AI is starting to get close to good enough. If it becomes twice as good in seven months then what?
What's the basis for your claim that it is on an exponential growth trajectory? That's not the way it feels to me as a fairly heavy user, it feels more like an asymptotic approach to expert human level performance where each new model gets a bit closer but is not yet reaching it, at least in areas where I am expert enough to judge. Improvements since the original ChatGPT don't feel exponential to me.
This also tracks with my experience. Of course, technical progress never looks smooth through the steep part of the s-curve, more a sequence of jagged stair-steps (each their own little s-curve in miniature). We might only be at the top of a stair. But my feeling is that we're exhausting the form-factor of LLMs. If something new and impressive comes along it'll be shaped different and fill a different niche.
People don't consider that there are real physical/thermodynamic constraints on intelligence. It's easy to imagine some skynet scenario, but all evidence suggests that it takes significant increases in energy consumption to increase intelligence.
Even in nature this is clear. Humans are a great example: cooked food predates homo sapiens and it is largely considered to be a pre-requisite for having human level intelligence because of the enormous energy demands of our brains. And nature has given us wildly more efficient brains in almost every possible way. The human brain runs on about 20 watts of power, my RTX uses 450 watts at full capacity.
The idea of "runaway" super intelligence has baked in some very extreme assumptions about the nature of thermodynamics and intelligence, that are largely just hand waved away.
On top of that, AI hasn't changed in a notable way for me personally in a year. The difference between 2022 and 2023 was wild, between 2023 and 2024 changed some of my workflows, 2024 to today largely is just more options around which tooling I used and how these tools can be combined, but nothing really at a fundamental level feels improved for me.
I was worried about that a couple of years ago, when there was a lot of hope that deeper reasoning skills and hallucination avoidance would simply arrive as emergent properties of a large enough model.
More recently, it seems like that's not the case. Larger models sometimes even hallucinate more [0]. I think the entire sector is suffering from a Dunning Kruger effect -- making an LLM is difficult, and they managed to get something incredible working in a much shorter timeframe than anyone really expected back in the early 2010s. But that led to overconfidence and hype, and I think there will be a much longer tail in terms of future improvements than the industry would like to admit.
Even the more advanced reasoning models will struggle to play a valid game of chess, much less win one, despite having plenty of chess games in their training data [1]. I think that, combined with the trouble of hallucinations, hints at where the limitations of the technology really are.
Hopefully LLMs will scare society into planning how to handle mass automation of thinking and logic, before a more powerful technology that can really do it arrives.
The RAG technique uses a smaller model and an external knowledge base that's queried based on the prompt. The technique allows small models to outperform far larger ones in terms of hallucinations, at the cost of performance. That is, to eliminate hallucinations, we should alter how the model works, not increase its scale: https://highlearningrate.substack.com/p/solving-hallucinatio....
Pruned models, with fewer parameters, generally have a lower hallucination risk: https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00695.... "Our analysis suggests that pruned models tend to generate summaries that have a greater lexical overlap with the source document, offering a possible explanation for the lower hallucination risk."
At the same time, all of this should be contrasted with the "Bitter Lesson" (https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...). IMO, making a larger LLMs does indeed produce a generally superior LLM. It produces more trained responses to a wider set of inputs. However, it does not change that it's an LLM, so fundamental traits of LLMs - like hallucinations - remain.
How are you measuring this improvement factor? We have numerous benchmarks for LLMs and they are all saturating. We are rapidly approaching AGI by that measure, and headed towards ASI. They still won't be "human" but they will be able to do everything humans can, and more.
Only for the people already affluent enough to afford the ever-more expensive subscriptions. Those most in need of a floor-raising don’t have the disposable income to take a bet on AI.
It's very easy to sign up for an API account and pay per-call, or even nothing. Free offerings out there are great (Gemini, OpenRouter...) and a few are even suitable for agentic development.
Either you are the item being sold or you are paying for the service.
Nothing is free, and I for one prefer a subscription model, if only as a change from the ad model.
I am sure we will see the worst of all worlds, but for now, for this moment in history, subscription is better than ads.
Let’s also never have ads in GenAi tools. The kind of invasive intent level influence these things can achieve, will make our current situation look like a paradise
I'd never buy anything as overt as an advertisement in an AI tool. I just want to buy influence. Just coincidentally use my product as the example. Just suggest my preferred technology when asked a few % more often than my competitors. I'd never want someone to observe me pulling the strings
Wage suppression. The capital ownership class knows exactly how to weaponize the hype as an excuse now. It's also curious how software engineers working on it are cheerfully building their potential future replacements and long-term insecurity.
I'd argue that AI reduces the distance between the floor and the ceiling, only both the floor and ceiling move -- the floor moves up, the ceiling downwards.
Just using AI makes the floor move up, while over-reliance on it (a very personal metric) pushes the ceiling downwards.
Unlike the telephone (telephones excited a certain class of people into believing that world-wide enlightenment was on their doorstep), LLMs don't just reduce reliance on visual tells and mannerisms, they reduce reliance on thinking itself. And that's a very dangerous slope to go down on. What will happen to the next generation when their parents supply substandard socially-computed results of their mental work (aka language)? Culture will decay and societal norms will veer towards anti-civilizational trends. And that's exactly what we're witnessing these days. The things that were commonplace are now rare and sometimes mythic.
Everyone has the same number of hours and days and years. Some people master some difficult, arcane field while others while it away in front of the television. LLMs make it easier for the television-watchers to experience "entertainment nirvana" while enticing the smart, hard-workers to give up their toil and engage "just a little" rest, which due to the insidious nature of AI-based entertainment, meshes more readily with their more receptive minds.
I was thinking about this sentiment on my long car drive today.
it feels like when you need to paint walls in your house. If you've never done it before you'll probably reach for tape to make sure you don't ruin the ceiling and floors. the tape is a tool for amateur wall painters to get decent results somewhat efficiently compared to if they didn't. If your an actual good wall painter, tape only slows you down. You'll go faster without the "help".
You'll find many people lack the willpower and confidence to even get on the floor though. If it weren't for that they'd already know a programming language and be selling something.
I mean it makes sense that if the AI is trained on human-created things it can never actually do better than that. Can't bust through the ceiling of what it was trained on. And at the same time, it AI gives that power to people that just aren't very smart or good at something.
Imagine how useful it would be we could just add a button show our approval or disapproval of a piece of content without having to type true or false in the comment section. Let’s call it upvote or downvote button.
> Please don't complain that a submission is inappropriate. If a story is spam or off-topic, flag it. Don't feed egregious comments by replying; flag them instead. If you flag, please don't also comment that you did.
I would agree, and have personally enjoyed the article. I just assumed that the person who wrote "false" might have considered it to be spam (perhaps in the broader sense), and if they did, flagging is considered to be the proper way of showing their disagreement.
There are a lot of stories that aren't worthy of flagging, but they're just low-quality.
And I've seen people abuse flagging because they treat it like downvoting. Flagging removes a story from the front page entirely, whereas ideally downvoting would simply deprioritize it, i.e. a bunch of downvotes would move it from #5 rank to #60 or something, not get rid of it entirely.
I've definitely had to e-mail the mods a number of times to restore a flagged story that was entirely appropriate, but which a few users simply didn't like. It would be much better to have a downvote button, and reserve flagging for actual spam and inappropriate content.
i have seen a lot of speculation about my "false" comment on this story. i think the story is bullshit and AI 100% it raises the ceiling and doesn't help incompetent people that much. you see it over and over where incompetent people use AI and submit legal docs with hallucinations or have egregious security holes in their vibe coded projects. but i have seen highly skilled folks get month long projects done in a day with higher quality, more tests and more features than they would in the past. explaining all that didn't seem necessary since the whole thing is just not true.
The blog post has a bunch of charts, which gives it a veneer of objectivity and rigor, but in reality it's just all vibes and conjecture. Meanwhile recent empirical studies actually point in the opposite direction, showing that AI use increases inequality, not decrease it.
https://www.economist.com/content-assets/images/20250215_FNC...
https://www.economist.com/finance-and-economics/2025/02/13/h...
Of course AI increases inequality. It's automated ladder pulling technology.
To become good at something you have to work through the lower rungs and acquire skill. AI does all those lower level jobs, puts the people who need those jobs for experience on the street, and robs us of future experts.
The people who benefit the most are those who are already up on top of the ladder investing billions to make the ladder raise faster and faster.
AI has been extremely useful at teaching me things. Granted I needed to already know how to learn and work through the math myself, but when I get stuck it is more helpful than any other resource on the internet.
> To become good at something you have to work through the lower rungs and acquire skill. AI does all those lower level jobs, puts the people who need those jobs for experience on the street, and robs us of future experts.
You can still do that with AI, you give yourself assignments and then use the AI as a resource when you get stuck. As you get better you ask the AI less and less. The fact that the AI is wrong sometimes is like test that allows you to evaluate if you are internalizing the skills or just trusting the AI.
If we ever have AIs which don't hallucinate, I'd want that added back in as a feature.
When you have an unfair system, every technology advancement will benefit the few more than the many.
So off course AI falls into this realm.
Whether ladder raising is benefitting people now or later or by how much - I don't know.
But I share your concerns that:
AI doing the lesser tasks of [whatever] ->
less(no?) humans will do those tasks ->
less(no?) experienced humans to further the state of the art ->
automation-but-stagnation.
But tragedy of the commons says I have to teach my kid to use AI!
You could just teach them to be gardeners or carpenters
They would still need to use AI to run their work with higher profit margin ;)
It's the trajectory of automation for the past few decades. Automate many jobs out of existence, and add a much smaller set of higher-skill jobs.
Centuries, surely? "In the year of eighteen and two, peg and awl..."
AI can teach you the lower rungs more effectively than what existed before.
Honestly not sure it is easier to learn coding today than before. In theory maybe but in reality 99% of people will use AI as a crutch - half or learning is when you have to struggle a bit with something. If all the answers are always in front of you it will be harder to learn. I know it would be hard for me to learn if I could just ask for the code all the time.
It is but requires discipline.
I've been coding for 15 years but I find I'm able to learn new languages and concepts faster by asking questions to ChatGPT.
It takes discipline. I have to turn off cursor tab when doing coding exercises. I have to take the time to ask questions and follow-up questions.
But yes I worry it's too easy to use AI as a crutch
It's much, much, much easier.
I've been coding for decades already, but if I need to put something together in an unfamiliar language? I can just ask AI about any stupid noob mistake I make.
It knows every single stupid noob mistake, it knows every "how do I sort an array", and it explains well, with examples. Like StackOverflow on steroids.
The caveat is that you need to WANT to learn. If you don't, then not learning is easier than ever too.
> I've been coding for decades already, but if I need to put something together in an unfamiliar language? I can just ask AI about any stupid noob mistake I make.
So you aren’t still learning foundational concepts or how to think about problems, you are using it as a translation tool. Very different, in my opinion.
And yet it's not used that way in the vast majority of cases. Most people don't want to learn. They want to get a result quickly, and move on.
There is a difference between pulling up a ladder and people choosing not to climb it.
I agree with you - I learned to program because I found it fascinating, and wanted to know how my computer worked, not because it was the only option available to me at the time...
There are always people willing to take shortcuts at long-term expense. Frankly I'm fine with the selection pressure changing in our industry. Those who want to learn will still find a way to do it.
It’s a very small difference. People would rather line up for the elevator than take the stairs. That’s just human nature.
Yeah, the graphs make some really big assumptions that don't seem to be backed up anywhere except AI maximalist head canon.
There's also a gap in addressing vibe coded "side projects" that get deployed online as a business. Is the code base super large and complex? No. Is AI capable of taking input from a novice and making something "good enough" in this space? Also no.
The later remarks are very strong assumptions underestimating the power AI tools offer.
AI tools are great at unblocking and helping their users explore beyond their own understanding. The tokens in are limited to the users' comprehension, but the tokens out are generated from a vast collection of greater comprehension.
For the novice, it's great at unblocking and expanding capabilities. "Good enough" results from novices are tangible. There is no doubt the volume of "good enough" is perceived as very low by many.
For large and complex codebases, unfortunately the effects of tech debt (read: objectively subpar practices) translate into context rot at development time. A properly architected and documented codebase that adheres to common well structured patterns can easily be broken down into small easily digestible contexts. i.e. a fragmented codebase does not scale well with LLMs, because the fragmentation is seeding the context for the model. The model reflects and acts as an amplifier to what it's fed.
> For the novice, it's great at unblocking and expanding capabilities. "Good enough" results from novices are tangible. There is no doubt the volume of "good enough" is perceived as very low by many.
For personal tools or whatever, sure. And the tooling or infrastructure might get there for real projects eventually, but it’s not currently. The prospect of someone naively vibe coding a side business including a payment or authentication system or something that stores PII— all areas developers learn the dangers of through the wisdom gained only by experience— sends shivers down my spine. Even amateur coders trying that stuff try old fashioned way must read their code and the docs and info on the net and such and will likely get some sense of the danger. Yesterday I saw someone here recounting a disastrous data breach of their friend’s vibe coded side hustle.
The big problem I see here is people not knowing enough to realize that something functioning is almost never a sign that it is “good enough” for many things they might assume it is. Gaining the amount of base knowledge to evaluate things like form security nearly makes the idea of vibe coding useless for anything more than hobby or personal utility projects.
> For large and complex codebases, unfortunately the effects of tech debt (read: objectively subpar practices) translate into context rot at development time. A properly architected and documented codebase that adheres to common well structured patterns can easily be broken down into small easily digestible contexts. i.e. a fragmented codebase does not scale well with LLMs, because the fragmentation is seeding the context for the model. The model reflects and acts as an amplifier to what it's fed.
It seems like you're claiming complex codebases are hard for LLMs because of human skill issues. IME it's rather the opposite - an LLM makes it easier for a human to ramp up on what a messy codebase is actually doing, in a standard request/response model or in terms of looking at one call path (however messy) at a time. The models are well trained on such things and are much faster at deciphering what all the random branches and nested bits and pieces do.
But complex codebases actually usually arise because of changing business requirements, changing market conditions, and iteration on features and offerings. Execution quality of this varies but a "properly architected and documented codebase" is rare in any industry with (a) competitive pressure and (b) tolerance for occasional bugs. LLMs do not make the need to serve those varied business goals go away, nor do they remove the competitive pressure to move rapidly vs gardening your codebase.
And if you're working in an area with extreme quality requirements that have forced you into doing more internal maintenance and better codebase hygiene then you find yourself with very different problems with unleashing LLMs into that code. Most of your time was never spent writing new features anyway, and LLM-driven insight into rare or complex bugs, interactions, and performance still appears quite hit or miss. Sometimes it saves me a bunch of time. Sometimes it goes in entirely wrong directions. Asking it to make major changes, vs just investigate/explain things, has an even lower hit rate.
I'm stating that a lack of codebase hygiene introduces context rot and substantially reduces the efficacy of working with an LLM.
Too wide of surface area in one context also causes efficiency issues. Lack of definition in context and you'll get less lower quality results.
Do keep in mind the code being read and written is intrinsically added to context.
In a sense I agree. I don't necessarily think that it has to be the case, but I got that same feeling of that it was wearing a white lab coat to be a scientist. I think their honest attempt was to express the relationship of how they perceive things.
I think this could still be used as a valuable form of communication if you can clearly express the idea that this is representing a hypothesis rather than a measurement. The simplest would be to label the graphs as "hypothesis". but a subtle but easily identifiable visual change might be better.
Wavy lines for the axis spring to mind as an idea to express that. I would worry about the ability to express hypotheses about definitive events that happen when a value crosses an axis though, You'd probably want a straight line for that. Perhaps it would be sufficient to just have wavy lines at the ends of the axes beyond the point at which the plot appears.
Beyond that. I think the article presumes the flattening of the curve as mastery is achieved. I'm not sure that's a given, perhaps it seems that way because we evaluate proportional improvement, implicitly placing skill on a logarithmic scale.
I'd still consider the post from the author as being done in better faith than the economist links.
Id like to know what people think, and for them to say that honestly. If they have hard data, they show it and how it confirms their hypothesis. At the other end of the scale is gathering data and only exposing the measurements that imply a hypothesis that you are not brave enough to state explicitly.
The graphic has four studies that show increased inequality and six that show reduced inequality.
> The graphic has four studies that show increased inequality
Three, since Toner-Rodgers 2024 currently seems to be a total fabrication.
https://archive.is/Ql1lQ
Read my comment again. keyword here is "recent". The second link also expands on why it's relevant. It's best to read the whole article, but here's a paragraph that captures the argument:
>The shift in recent economic research supports his observation. Although early studies suggested that lower performers could benefit simply by copying AI outputs, newer studies look at more complex tasks, such as scientific research, running a business and investing money. In these contexts, high performers benefit far more than their lower-performing peers. In some cases, less productive workers see no improvement, or even lose ground.
All of the studies were done 2023-2024 and are not listed in order that they were conducted. The studies showing reduced equality all apply to uncommon tasks like material discovery and debate points, whereas the ones showing increased equality are broader and more commonly applicable, like writing, customer interaction, and coding.
>All of the studies were done 2023-2024 and are not listed in order that they were conducted
Right, the reason why I pointed out "recent" is that it's new evidence that people might not be aware of, given that there were also earlier studies showing AI had the opposite effect on inequality. The "recent" studies also had varied methodology compared to the earlier studies.
>The studies showing reduced equality all apply to uncommon tasks like material discovery and debate points
"Debating points" is uncommon? Maybe not everyone was in the high school debate club, but "debating points" is something that anyone in a leadership position does on a daily basis. You're also conveniently omitting "investment decisions" and "profits and revenue", which basically everyone is trying to optimize. You might be tempted to think "Coding efficiency" represents a high complexity task, but the abstract says the test involved "Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible". The same is true of the task used in the "legal analysis" study, which involved drafting contracts or complaints. This seems exactly like the type of cookie cutter tasks that the article describes would become like cashiers and have their wages stagnate. Meanwhile the studies with negative results were far more realistic and measured actual results. Otis et al 2023 measured profits and revenue of actual Kenyan SMBs. Roldan-Mones measured debate performance as judged by humans.
> Right, the reason why I pointed out "recent" is that it's new evidence that people might not be aware of, given that there were also earlier studies showing AI had the opposite effect on inequality.
Okay, well the majority of this "recent" evidence agrees with the pre-existing evidence that inequality is reduced.
> "Debating points" is uncommon?
Yes. That is nobody's job. Maybe every now and then you might need to come up with some arguments to support a position, but that's not what you get paid to do day to day.
> You're also conveniently omitting "investment decisions" and "profits and revenue", which basically everyone is trying to optimize.
Very few people are making investment decisions as part of their day to day job. Hedge funds may experience increasing inequality, but that kinda seems on brand.
On the other hand "profits and revenue" is not a task.
> You might be tempted to think "Coding efficiency" represents a high complexity task, but the abstract says the test involved "Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible". The same is true of the task used in the "legal analysis" study, which involved drafting contracts or complaints.
These sound like real tasks that a decent number of people have to do on a regular basis.
> Meanwhile the studies with negative results were far more realistic and measured actual results. Otis et al 2023 measured profits and revenue of actual Kenyan SMBs. Roldan-Mones measured debate performance as judged by humans.
These sound like niche activities that are not widely applicable.
Yup. As a retired mathematician who craves the productivity of an obsessed 28 year old, I've been all in on AI in 2025. I'm now on Claude's $200/month Max plan in order to use Claude Code Opus 4 without restraint. I still hit limits, usually when I run parallel sessions to review a 57 file legacy code base.
For a time I refused to talk with anybody or read anything about AI, because it was all noise that didn't match my hard-earned experience. Recently HN has included some fascinating takes. This isn't one.
I have the opinion that neurodivergents are more successful using AI. This is so easily dismissed as hollow blather, but I have a precise theory backing this opinion.
AI is a giant association engine. Linear encoding (the "King - Man + Woman = Queen" thing) is linear algebra. I taught linear algebra for decades.
As I explained to my optometrist today, if you're trying to balance a plate (define a hyperplane) with three fingers, it works better if your fingers are farther apart.
My whole life people have rolled their eyes when I categorize a situation using analogies that are too far flung for their tolerances.
Now I spend most of my time coding with AI, and it responds very well to my "fingers farther apart" far reaching analogies for what I'm trying to focus on. It's an association engine based on linear algebra, and I have an astounding knack for describing subspaces.
AI is raising the ceiling, not the floor.
Can you explain your finger analogy a little more? What do the fingers represent?
Would you sit on a stool with legs three inches apart?
For a statistician, determining a plane from three approximate points on the plane is far more accurate if the points aren't next to each other.
When we offer examples or associations in a prompt, we experience a similar effect in coaxing a response from AI. This is counter-intuitive.
I'm fully aware that most of what I post on HN is intended for each future AI training corpus. If what I have to say was already understood I wouldn't say it.
> Now I spend most of my time coding with AI, and it responds very well to my "fingers farther apart" far reaching analogies for what I'm trying to focus on.
If you made analogies based on Warhammer 40k or species of mosquitoes it would have reacted exactly the same.
Thanks for the links. That should be obvious to anyone who believes that $70 billion datacenters (Meta) are needed and the investment will be amortized by subscriptions (in the case of Meta also by enhanced user surveillance).
The means of production are in a small oligopoly, the rest will be redundant or exploitable sharecroppers.
(All this under the assumption that "AI" works, which its proponents affirm in public at least.)
[dead]
> inequality
It's free for everyone with a phone or a laptop.
This mirrors insights from Andrew Ng's recent AI startup talk [1].
I recall he mentions in this video that the new advice they are giving to founders is to throw away prototypes when they pivot instead of building onto a core foundation. This is because of the effects described in the article.
He also gives some provisional numbers (see the section "Rapid Prototyping and Engineering" and slides ~10:30) where he suggests prototype development sees a 10x boost compared to a 30-50% improvement for existing production codebases.
This feels vaguely analogous to the switch from "pets" to "livestock" when the industry switched from VMs to containers. Except, the new view is that your codebase is more like livestock and less like a pet. If true (and no doubt this will be a contentious topic to programmers who are excellent "pet" owners) then there may be some advantage in this new coding agent world to getting in on the ground floor and adopting practices that make LLMs productive.
1. https://www.youtube.com/watch?v=RNJCfif1dPY
IMO the problem with this pets vs. livestock analogy is that it focuses on the code when the value is really in the writers head. Their understanding and mental model of the code is what matters. AI tools can help with managing the code, helping the writer build their models and express their thoughts, but it has zero impact on where the true value is located.
Great point, but just mentioning (nitpicking?) that I never heard about machines/containers referred to as "livestock", but rather in my milieu it's always "pets" vs "cattle". I now wonder if it's a geographical thing.
Yeah, the CERN talk* [0] coined the term Pets vs. Cattle analogy, and it was way before VMs were cheap on bare metal. I think the word just evolved as the idea got rooted in the community.
We use the same analogy for the last 20 years or so. Provisioning 150 cattle servers take 15 minutes or so, and we can provision a pet in a couple of hours, at most.
[0]: https://www.engineyard.com/blog/pets-vs-cattle/
*: Engine Yard post notes that Microsoft's Bill Baker used the term earlier, though CERN's date (2012) checks out with our effort timeline and how we got started.
Randy Bias also claims authorship https://cloudscaling.com/blog/cloud-computing/the-history-of...
this tweet by Tim Bell seems to indicate shared credit with Bill Baker and Randy Bias
https://x.com/noggin143/status/354666097691205633
@randybias @dberkholz CERN's presentation of pets and cattle was derived from Randy's (and Bill Baker's previously).
I didn't mean to dispute who said it first, but wanted to say that we took the terms from CERN, and we got them around the time of their talk.
First time I heard it was from Adrian Cockcroft in... I think 2012, he def was talking about it a lot in 2013/2014, looks like he got it from Bill. https://se-radio.net/2014/12/episode-216-adrian-cockcroft-on...
Boxen? (Oxen)
AFAIK, Boxen is a permutation of Boxes, not Oxen.
There seems to be a pattern of humorous plurals in English where by analogy with ox ~ oxen you get -x ~ -xen: boxen, Unixen, VAXen.
Before you call this pattern silly, consider that the fairly normal plural “Unices” is by analogy with Latin plurals in -x = -c|s ~ -c|ēs, where I’ve expanded -x into -cs to make it clear that the Latin singular comprises a noun stem ending in -c- and a (nominative) singular ending -s, which does exist in Latin but is otherwise completely nonexistent in English. (This is extra funny for Unix < Unics < Multics.) Analogies are the order of the day in this language.
Yeah. After reading your comment, I thought "maybe the Xen hypervisor is named because of this phenomena". "xen" just means "many" in that context.
Also, probably because of approaching graybeard territory, Thinking about boxen of VAXen running UNIXen makes me feel warm and fuzzy. :D
Thanks for pointing this out. I think this is an insightful analogy. We will likely manage generated code in the same way we manage large cloud computing complexes.
This probably does not apply to legacy code that has been in use for several years where the production deployment gives you a higher level of confidence (and a higher risk of regression errors with changes).
Have you blogged about your insights, the https://stillpointlab.com site is very sparse as is @stillpointlab
I'm currently in build mode. In some sense, my project is the most over complicated blog engine in the history of personal blog engines. I'm literally working on integrating a markdown editor to the project.
Once I have the MVP working, I will be working on publishing as a means to dogfood the tool. So, check back soon!
Is there a mailing list I can sign up for to be notified. The check back soon protocol reminds me of my youth.
Mailing list is on the roadmap but doesn't exist just yet.
What you could do: sign in using one of the OAuth methods, go to the user page and then go to the feedback section. Let me know in a message your email and I'll ping you once the blog is setup.
Sorry it is primitive at this stage but I'm prioritizing MVP before marketing.
Oo, the "pets vs. livestock" analogy really works better than the "craftsmen vs. slop-slinger" arguments.
Because using an LLM doesn't mean you devalue well-crafted or understandable results. But it does indicate a significant shift in how you view the code itself. It is more about the emotional attachment to code vs. code as a means to an end.
I don't think it's exactly emotional attachment. It's the likelihood that I'm going to get an escalated support ticket caused by this particular piece of slop/artisanally-crafted functionality.
Not to slip too far into analogy, but that argument feels a bit like a horse-drawn carriage operator saying he can't wait to pick up all of the stranded car operators when their mechanical contraptions break down on the side of the road. But what happened instead was the creation of a brand new job: the mechanic.
I don't have a crystal ball and I can't predict the actual future. But I can see the list of potential futures and I can assign likelihoods to them. And among the potential futures is one where the need for humans to fix the problems created by poor AI coding agents dwindles as the industry completely reshapes itself.
Both can be true. There were probably a significant number of stranded motorists that were rescued by horse-powered conveyance. And eventually cars got more convenient and reliable.
I just wouldn't want to be responsible for servicing a guarantee about the reliability of early cars.
And I'll feel no sense of vindication if I do get that support case. I will probably just sigh and feel a little more tired.
Yes, the whole point that it is true. But only for a short window.
So consider differing perspectives. Like a teenage kid that is hanging around the stables, listening to the veteran coachmen laugh about the new loud, smoky machines. Proudly declaring how they'll be the ones mopping up the mess, picking up the stragglers, cashing it in.
The career advice you give to the kid may be different than the advice you'd give to the coachman. That is the context of my post: Andrew Ng isn't giving you advice, he is giving advice to people at the AI school who hope to be the founders of tomorrow.
And you are probably mistaken if you think the solution to the problems that arise due to LLMs will result in those kids looking at the past. Just like the ultimate solution to car reliability wasn't a return to horses but rather the invention of mechanics, the solution to problems caused by AI may not be the return to some software engineering past that the old veterans still hold dear.
I don't know about what's economically viable, but I like writing code. It might go away or diminish as a viable profession, which might make me a little sad. There are still horse enthusiasts who do horse things for fun.
Things change, and that's ok. I guess I just got lucky so far that this thing I like doing just so happens to align with a valuable skill.
I'm not arguing for or against anything, but I'll miss it if it goes away.
In my world that isn't inherently a bad thing. Granted, I belong to the YAGNI crowd of software engineers who put business before tech architecture. I should probably mention that I don't think this means you should skip on safety and quality where necessariy, but I do preach that the point of software is to serve the business as fast as possible. I do this to the extend where I actually think that our BI people who are most certainly not capable programmers are good at building programs. They mostly need oversight on external dependencies, but it's actually amazing what they can produce in a very short amount of time.
Obviously their software sucks, and eventually parts of it always escalates into a support ticket which reaches my colleagues and me. It's almost always some form of performance issue, this is in part because we have monthly sessions where they can bring issues they simply can't get to work to us. Anyway, I see that as a good thing. It means their software is serving the business and now we need to deal with the issues to make it work even better. Sometimes that is because their code is shit, most times it's because they've reached an actual bottleneck and we need to replace part of their Python with a C/Zig library.
The important part of this is that many of these bottlenecks appear in areas that many software enginering teams that I have known wouldn't necessarily have predicted. Mean while a lot of the areas that traditional "best practices" call for better software architecture for, work fine for entire software lifecycles being absolutely horrible AI slop.
I think that is where the emotional attachment is meant to fit in. Being fine with all the slop that never actually matters during a piece of softwares lifecycle.
There are some things that you still can't do with LLMs. For example, if you tried to learn chess by having the LLM play against you, you'd quickly find that it isn't able to track a series of moves for very long (usually 5-10 turns; the longest I've seen it last was 18) before it starts making illegal choices. It also generally accepts invalid moves from your side, so you'll never be corrected if you're wrong about how to use a certain piece.
Because it can't actually model these complex problems, it really requires awareness from the user regarding what questions should and shouldn't be asked. An LLM can probably tell you how a knight moves, or how to respond to the London System. It probably can't play a full game of chess with you, and will virtually never be able to advise you on the best move given the state of the board. It probably can give you information about big companies that are well-covered in its training data. It probably can't give you good information about most sub-$1b public companies. But, if you ask, it will give a confident answer.
They're a minefield for most people and use cases, because people aren't aware of how wrong they can be, and the errors take effort and knowledge to notice. It's like walking on a glacier and hoping your next step doesn't plunge through the snow and into a deep, hidden crevasse.
LLMs playing chess isn't a big deal. You can train a model on chess games and it will play at a decent ELO and very rarely make illegal moves(i.e 99.8% legal move rate). There are a few such models around. I think post training messes with chess ability and Open ai et al just don't really care about that. But LLMs can play chess just fine.
[0] https://arxiv.org/pdf/2403.15498v2
[1] https://github.com/adamkarvonen/chess_gpt_eval
Jeez, that arxiv paper invalidates my assumption that it can't model the game. Great read. Thank you for sharing.
Insane that the model actually does seem to internalize a representation of the state of the board -- rather than just hitting training data with similar move sequences.
...Makes me wish I could get back into a research lab. Been a while since I've stuck to reading a whole paper out of legitimate interest.
(Edit) At the same time, it's still worth noting the accuracy errors and the potential for illegal moves. That's still enough to prevent LLMs from being applied to problem domains with severe consequences, like banking, security, medicine, law, etc.
> people aren't aware of how wrong they can be, and the errors take effort and knowledge to notice.
I have friends who are highly educated professionals (PhDs, MDs) who just assume that AI\LLMs make no mistakes.
They were shocked that it's possible for hallucinations to occur. I wonder if there's a halo effect where the perfect grammar, structure, and confidence of LLM output causes some users to assume expertise?
Computers are always touted as deterministic machines. You can't argue with a compiler, or Excel's formula editor.
AI, in all its glory, is seen as an extension of that. A deterministic thing which is meticulously crafted to provide an undisputed truth, and it can't make mistakes because computers are deterministic machines.
The idea of LLMs being networks with weights plus some randomness is both a vague and too complicated abstraction for most people. Also, companies tend to say this part very quietly, so when people read the fine print, they get shocked.
> I wonder if there's a halo effect where the perfect grammar, structure, and confidence of LLM output causes some users to assume expertise?
I think it's just that LLMs are modeling generative probability distributions of sequences of tokens so well that what they actually are nearly infallible at is producing convincing results. Often times the correct result is the most convincing, but other times what seems most convincing to an LLM just happens to also be most convincing to a human regardless of correctness.
https://en.wikipedia.org/wiki/ELIZA_effect
> In computer science, the ELIZA effect is a tendency to project human traits — such as experience, semantic comprehension or empathy — onto rudimentary computer programs having a textual interface. ELIZA was a symbolic AI chatbot developed in 1966 by Joseph Weizenbaum and imitating a psychotherapist. Many early users were convinced of ELIZA's intelligence and understanding, despite its basic text-processing approach and the explanations of its limitations.
Its complete bullshit. There is no way anyone ever thought anything was going on in ELIZA. There were people amazed that "someone could program that" but they had no illusions about what it was, it was obvious after 3 responses.
Don't be so sure. It was 1966, and even at a university, few people had any idea what a computer was capable of. Fast forward to 2025...and actually, few people have any idea what a computer is capable of.
If I wasn't familiar with the latest in computer tech, I would also assume LLMs never make mistakes, after hearing such excited praise for them over the last 3 years.
It is only in the last century or so, that statistical methods were invented and applied. It is possible for many people to be very competent at what they are doing and at the same time be totally ignorant of statistics.
There are lies, statistics and goddamn hallucinations.
My experience, speaking over a scale of decades, is that most people, even very smart and well-educated ones, don't know a damn thing about how computers work and aren't interested in learning. What we're seeing now is just one unfortunate consequence of that.
(To be fair, in many cases, I'm not terribly interested in learning the details of their field.)
Have they never used it? Majority of the responses that I can verify are wrong. Sometimes outright nonse, sometimes believable. Be it general knowledge or something where deeper expertise is required.
I worry that the way the models "Speak" to users, will cause users to drop their 'filters' about what to trust and not trust.
We are barely talking modern media literacy, and now we have machines that talk like 'trusted' face to face humans, and can be "tuned" to suggest specific products or use any specific tone the owner/operator of the system wants.
> I have friends who are highly educated professionals (PhDs, MDs) who just assume that AI\LLMs make no mistakes.
Highly educated professionals in my experience are often very bad at applied epistemology -- they have no idea what they do and don't know.
It's super obvious even if you try and use something like agent mode for coding, it starts off well but drifts off more and more. I've even had it try and do totally irrelevant things like indent some code using various Claude models.
My favourite example is something that happens quite often even with Opus, where I ask it to change a piece of code, and it does. Then I ask it to write a test for that code, it dutifully writes one. Next, I tell it to run the test, and of course, the test fails. I ask it to fix the test, it tries, but the test fails again. We repeat this dance a couple of times, and then it seemingly forgets the original request entirely. It decides, "Oh, this test is failing because of that new code you added earlier. Let me fix that by removing the new code." Naturally, now the functionality is gone, so it confidently concludes, "Hey, since that feature isn't there anymore, let me remove the test too!"
"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." - Claude, probably
> you'd quickly find that it isn't able to track a series of moves for very long (usually 5-10 turns; the longest I've seen it last was 18)
In chess, previous moves are irrelevant, and LLM aren't good with filtering out irrelevant data [1]. For better performance, you should include only the relevant data in the context window: the current state of then board.
[1] https://news.ycombinator.com/item?id=44724238
Yeah, the chess example is interesting. The best specialised AIs for chess are all clearly better than humans, but our best general AIs are barely able to play legal moves. The ceiling for AI is clearly much higher than current LLMs.
Large Language Models aren't general AIs. Its in the name.
They are being marketed as such…
Since agents are good only at greenfield projects, the logical conclusion is that existing codebases have to be prepared such that new features are (opinionated) greenfield projects - let all the wiring dangle out of the wall so the intern just has to plug in the appliance. All the rest has to be done by humans, or the intern will rip open the wall to hang a picture.
Hogwash. If you can't figure out how to do something with project Y from npm try checking it out from Github with WebStorm and asking Junie how to do it -- often you get a good answer right away. If not you can ask questions that can help you understand the code base. Don't understand some data structure which is a maze of Map<String, Objects>(s) it will scan how it is used and give you draft documentation.
Sure you can't point it to a Jira ticket and get a PR but you certainly can use it as a pair programmer. I wouldn't say it is much faster than working alone but I end up writing more tests and arguing with it over error handling means I do a better job in the end.
> Sure you can't point it to a Jira ticket and get a PR
You absolutely can. This is exactly what SWE-Bench[0] measures, and I've been amazed at how quickly AIs have been climbing those ladders. I personally have been using Warp [1] a lot recently and in quite a lot of low-medium difficulty cases it can one-shot a decent PR. For most of my work I still find that I need to pair with it to get sufficiently good results (and that's why I still prefer it to something cloud-based like Codex [2], but otherwise it's quite good too), and I expect the situation to flip over the coming couple of years.
[0] https://www.swebench.com/
[1] https://www.warp.dev/
[2] https://openai.com/index/introducing-codex/
How does Warp compare to others you have tried?
I've not used it for long enough yet for this to be a strong opinion, but so far I'd say that it is indeed a bit better than Claude Code, as per the results on Terminal Bench[0]. And on a side note, I quite like the fact that I can type shell commands and chat commands interchangeably into the same input and it just knows whether to run it or respond to it (accidentally forgetting the leading exclamation mark has been a recurring mistake for me in Claude Code).
[0] https://www.tbench.ai/
What you describe is not using agents at all, which my comment was aimed at if you read the first sentence again.
Julie is marketed as an “agent” and it definitely works harder than the Jetbrains AI assistant.
They’re not. They’re good at many things and bad at many things. The more I use them the more I’m confused about which is which.
They are called slot machines for a reason.
I think agents have a curve where they're kinda bad at bootstrapping a project, very good if used in a small-to-medium-sized existing project and then it goes downhill from there as size increases, slowly.
Something about a brand-new project often makes LLMs drop to "example grade" code, the kind you'd never put in production. (An example: claude implemented per-task file logging in my prototype project by pushing to an array of log lines, serializing the entire thing to JSON and rewriting the entire file, for every logged event)
AI is an interpolator, not an extrapolator.
Very concise, thank you for sharing this insight.
I read this as interloper. What's an extraloper?
An interloper being someone who intrudes or meddles in a situation (inter "between or amid) + loper "to leap or run" - https://en.wiktionary.org/wiki/loper ), an extraloper would be someone who dances or leaps around the outside of a subject or meeting with similar annoyances.
Are you sure the extraloper doesn't just run away on tangents?
They run on secants towards the outside.
Wouldn't that be an exloper?
Edit: Geometrically, I agree an extraloper could run on secants (or radii) but they're not allowed to strike a chord.
But getting back to generative text, I feel that "tangent" is more appropriate ;-)
Opposite of "inter-" is "intra-".
Intraloper, weirdly enough, is a word in use.
"inter-" means between, "intra-" means within, "extra-" means outside. "intra-" and "inter" aren't quite synonyms but they definitely aren't opposites of eachother.
Inter- implies relationships between entities, intra- implies relationships within entities.
In any single sentence context they cannot refer to the same relationships, and that which they are not is precisely the domain of the other word: they are true antonyms.
> and that which they are not is precisely the domain of the other word
External relationships are a thing, which are in neither of those domains.
An intramural activity is not the same thing as an intermural activity, but the overwhelming majority of activities which are not intramural activities are not intermural activities either, most are extramural activities.
> External relationships are a thing, which are in neither of those domains.
They are a subset of inter-, pretty obviously.
So we have also have the word "extra", but oddly the word "exter" is left out.
I'm exter mad about that.
The learning-with-AI curve should cross back under the learning-without-AI curve at towards the higher end even without "cheating".
The very highest levels of mastery can only come from slow, careful, self-directed learning that someone in a hurry to speedrun the process isn't focusing on.
I agree with most of TFA but not this:
> This means cheaters will plateau at whatever level the AI can provide
From my experience, the skill of using AI effectively is of treating the AI with a "growth mindset" rather than a "fixed" one. What I do is that I roleplay as the AI's manager, giving it a task, and as long as I know enough to tell whether its output is "good enough", I can lend it some of my metagcognition via prompting to get it to continue working through obstacles until I'm happy with the result.
There are diminishing returns of course, but I found that I can get significantly better quality output than what it gave me initially without having to learn the "how" of the skill myself (i.e. I'm still "cheating"), and only focusing my learning on the boundary of what is hard about the task. By doing this, I feel that over time I become a better manager in that domain, without having to spend the amount of effort to become a practitioner myself.
How do you know it’s significantly better quality if you don’t know any of the “how”? The quality increase seems relative to the garbage you start with. I guess as long as you impress yourself with the result it doesn’t matter if it’s not actually higher quality.
I don't think "quality" has anything like a universal definition, and when people say that they probably mean an alignment with personal taste.
Does it solve the problem? As long as it isn't prohibitively costly in terms of time or resources, then the rest is really just taste. As a user I have no interest whatsoever if your code is "idiomatic" or "modular" or "functional". In other industries "quality" usually means free of defects, but software is unique in that we just expect products to be defective. Your surgeon operates on the wrong knee? The board could revoke the license, and you are getting a windfall settlement. A bridge design fails? Someone is getting sued or even prosecuted. SharePoint gets breached? Well, that's just one of those things, I guess. I'm not really bothered that AI is peeing in the pool that has been a sewer as long as I can remember. At least the AI doesn't bill at an attorney's rate to write a mess that barely works.
I wouldn’t classify what you’re doing as “cheating”!
This is exactly how I've been seeing it. If you're deeply knowledgable in a particular domain like lets say compiler optimization I'm unsure if LLM's will increase your capabilities (your ceiling), however, if you're working in a new domain LLMs are pretty good at helping you get oriented and thus raising the floor.
The greatest use of LLMs is the ability to get accurate answers to queries in a normalized format without having to wade through UI distraction like ads and social media.
It's the opposite of finding an answer on reddit, insta, tvtropes.
I can't wait for the first distraction free OS that is a thinking and imagination helper and not a consumption device where I have to block urls on my router so my kids don't get sucked into a skinners box.
I love being able to get answers from documentation and work questions without having to wade through some arbitrary UI bs a designer has implemented in adhoc fashion.
I don't find the "AI" answers all that accurate, and in some cases they are bordering on a liability even if way down below all the "AI" slop it says "AI responses may include mistakes".
>It's the opposite of finding an answer on reddit, insta, tvtropes.
Yeah it really is because I can tell when someone doesn't know the topic well on reddit, or other forums, but usually someone does and the answer is there. Unfortunately the "AI" was trained on all of this, and the "AI" is just as likely to spit out the wrong answer as the correct one. That is not an improvement on anything.
> wade through UI distraction like ads and social media
Oh, so you think "AI" is going to be free and clear forever? Enjoy it while it lasts, because these "AI" companies are in way over their heads, they are bleeding money like their aorta is a fire hose, and there will be plenty of ads and social whatever coming to brighten your day soon enough. The free ride won't go on forever - think of it as a "loss leader" to get you hooked.
I agree with the whole first half, but I disagree that LLM usage is doomed to ad-filled shittyness. AI companies may be hemmoraging money, but that's because their product costs so much to run; it's not like they don't have revenue. The thing that will bring profitability isn't ads, it will be innovations that let current-gen-quality LLMs run at a fraction of the electricity and power cost.
Will some LLMs have ads? Sure, especially at a free tier. But I bet the option to pay $20/month for ad-free LLM usage will always be there.
Silicon will improve, but not fast enough to calm investors. And better silicon won't change the fact that the current zeitgeist is basically a word guessing game.
$20 month won't get you much, if you're paying above what it costs to run the "AI", and for what? Answers that are in the ballpark of suspicious and untrustworthy?
Maybe they just need to keep spending until all the people who can tell slop from actual knowledge are all dead and gone.
"accurate"
This tracks for other areas of AI I am more familiar with.
Below average people can use AI to get average results.
This is in line with another quip about AI: You need to know more than the LLM in order to gain any benefit from it.
I am not certain that is entirely true.
I suppose it's all a matter of what one is using an LLM for, no?
GPT is great at citing sources for most of my requests -- even if not always prompted to do so. So, in a way, I kind of use LLMs as a search engine/Wikipedia hybrid (used to follow links on Wiki a lot too). I ask it what I want, ask for sources if none are provided, and just follow the sources to verify information. I just prefer the natural language interface over search engines. Plus, results are not cluttered with SEO ads and clickbait rubbish.
Hmm I don't feel like this should be taken as a tenet of AI. I feel a more relevant kernel would be less black and white.
Also I think what you're saying is a direct contradiction of the parent. Below average people can now get average results; in other words: The LLM will boost your capabilities (at least if you're already 'less' capable than average). This is a huge benefit if you are in that camp.
But for other cases too, all you need to know is where your knowledge ends, and that you can't just blindly accept what the AI responds with. In fact, I find LLMs are often most useful precisely when you don’t know the answer. When you’re trying to fill in conceptual gaps and explore an idea.
Even say during code generation, where you might not fully grasp what’s produced, you can treat the model like pair programming and ask it follow-up questions and dig into what each part does. They're very good at converting "nebulous concept description" into "legitimate standard keyword" so that you can go and find out about said concept that you're unfamiliar with.
Realistically the only time I feel I know more than the LLM is when I am working on something that I am explicitly an expert in, and in which case often find that LLMs provide nuance lacking suggestions that don’t always add much. It takes a lot more filling in context in these situations for it to be beneficial (but still can be).
Take a random example of nifty bit of engineering: The powerline ethernet adapter. A curious person might encounter these and wonder how they work. I don't believe an understanding of this technology is very obvious to a layman. Start asking questions and you very quickly come to understand how it embeds bits in the very same signal that transmits power through your house without any interference between the two "types" of signal. It adds data to high frequencies on one end, and filters out the regular power transmitting frequencies at the other end so that the signal can be converted back into bits for use in the ethernet cable (for a super brief summary). But if want to really drill into each and every engineering concept, all I need to do is continue the conversation.
I personally find this loop to be unlike anything I've experienced as far as getting immediate access to an understanding and supplementary material for the exact thing Im wondering about.
Above average people can also use it to get average results. Which can actually be useful. For many tasks and usecases, the good enough threshold can actually be quite low.
That explains why people here are against it, because everyone is above average I guess.
I'm not against it. I wonder where in the distribution it puts me.
At the "Someone willing to waste their time with slop" end?
>Below average people can use AI to get average results.
But that would shift the average up.
I think a good way to see it is "AI is good for prototyping. AI is not good for engineering"
To clarify, I mean that the AI tools can help you get things done really fast but they lack both breadth and depth. You can move fast with them to generate proofs of concept (even around subproblems to large problems), but without breadth they lack the big picture context and without depth they lack the insights that any greybeard (master) has. On the other hand, the "engineering" side is so much more than "things work". It is about everything working in the right way, handling edge cases, being cognizant of context, creating failure modes, and all these other things. You could be the best programmer in the world, but that wouldn't mean you're even a good engineer (in real world these are coupled as skills learned simultaneously. You could be a perfect leetcoder and not helpful on an actual team, but these skills correlate).
The thing is, there will never be a magic button that a manager can press to engineer a product. The thing is, for a graybeard most of the time isn't spent around implementation, but design. The thing is, to get to mastery you need experience, and that experience requires understanding of nuanced things. Things that are non-obvious. There may be a magic button that allows an engineer to generate all the code for codebase, but that doesn't replace engineers. (I think this is also a problem in how we've been designing AI code generators. It's as if they're designed for management to magically generate features. The same thing they wish they could do with their engineers. But I think the better tool would be to focus on making a code generator that would generate based on an engineer's description.
I think Dijkstra's comments apply today just as much as they did then[0]
[0] On the foolishness of "natural language programming" https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
I was reading some stuff by Michael A. Jackson (Problem Frames Approach) and T.S.E Maibaum (Mathematical Foundations on Software Engineering) because I also had the impression that too much talk around LLM-assisted programming focuses on program text and annotations / documentation. Thinkers like Donald Schön thought about tacit knowledge-in-action and approached this with design philosophy. when looking at LLM-assisted programming, I call this shaded context.
as you say, software engineering is not only constructing program texts, its not even only applied math or overly scientific. at least that is my stance. I suspect AI code editors have lots of said tacit knowledge baked in (via the black box itself or its engineers) but we would be better off thinking about this explicitly.
100% agree
In things that I am comparatively good at (e.g., coding), I can see that it helps 'raise the ceiling' as a result of allowing me to complete more of the low level tasks more effectively. But it is true as well that it hasn't raised my personal bar in capability, as far as I can measure.
When it comes to things I am not good at at, it has given me the illusion of getting 'up to speed' faster. Perhaps that's a personal ceiling raise?
I think a lot of these upskilling utilities will come down to delivery format. If you use a chat that gives you answers, don't expect to get better at that topic. If you use a tool that forces you to come up with answers yourself and get personalized validation, you might find yourself leveling up.
> When it comes to things I am not good at at, it has given me the illusion of getting 'up to speed' faster. Perhaps that's a personal ceiling raise?
Disagree. It's only the illusion of a personal ceiling raise.
---
Example 1:
Alice has a simple basic text only blog. She wants to update the styles on his website, but wants to keep his previous posts.
She does research to learn how to update a page's styles to something more "modern". She updates the homepage, post page, about page. She doesn't know how to update the login page without breaking it because it uses different elements she hasn't seen before.
She does research to learn what the new form elements and on the way sees recommendations on how to build login systems. She builds some test pages to learn how to restyle forms and while she's at it, also learns how to build login systems.
She redesigns her login page.
Alice believes she has raised the ceiling what she can accomplish.
Alice is correct.
---
Example 2:
Bob has a simple basic text only blog. He wants to update the styles on his website, but wants to keep his previous posts.
He asks the LLM to help him update styles to something more "modern". He updates the homepage, post page, about page, and login page.
The login page doesn't work anymore.
Bob asks the LLM to fix it and after some back and forth it works again.
Bob believes she has raised the ceiling what he can accomplish.
Bob is incorrect. He has not increased his own knowledge or abilities.
A week later his posts are gone.
---
There are only a few differences between both examples:
1. Alice does not use LLMs, but Bob does. 2. Alice knows how to redesign pages, but Bob does not. 3. Alice knows how login systems work, but Bob does not.
Bob simply asked the LLM to redesign the login page, and it did.
When the page broke, he checked that he was definitely using the right username and password but it still wasn't working. He asked the LLM to change the login page to always work with his username and password. The LLM produced a login form that now always accepted a hard coded username and password. The hardcoded check was taking place on the client where the username and password were now publicly viewable.
Bob didn't ask the LLM to make the form secure, he didn't even know that he had to ask. He didn't know what any of the footguns to avoid were because he didn't even know there were any footguns to avoid in the first place.
Both Alice and Bob started from the same place. They both lacked knowledge on how login systems should be built. That knowledge was known because it is documented somewhere, but it was unknown to them. It is a "known unknown".
When Alice learned how to style form elements, she also read links on how forms work which lead her to links on how login systems work. That knowledge for her went from an unknown known to a "known known" (knowledge that is known, that she now also knows).
When Bob asked the LLM to redesign his login page, at no point in time does the knowledge of how login systems work become a "known known" for him. And a week later some bored kid finds the page, right clicks on the form, clicks inspect and sees a username and password to log in with.
Most non-trivial expertise topics are not one-dimensional. You might be at the "ceiling" in some particular sub-niche, while still on the floor on other aspects of the topic.
So even if you follow the artcles premise (I do not), it still can potentially 'raise' you wherever you were.
Key seems to be wether you have enough expertise to evaluate or test the outputs. Some others have refered to this as having a good sense of the "known/unknown" matrix for the domain.
The AI will be most helpful for you in the known-unknown / unknown-known axis, not so much in the known-known / unknown-unknown parts. The latter unfortunatly is were you see the most derailed use of the tech.
AI raises everything - the ceiling is just being more productive. Productivity comes from adequacy and potency of tools. We got a hell of a strong tool in our hands, therefore, the more adequate the usage, the higher the leverage.
Surprised to see this downvoted. It feels true to me. Sure there are definitely novel areas where folks might not benefit but I can see a future where this tool becomes helpful for the vast majority of roles.
AI is going to cause a regression to the most anodyne output across many industries. As humans who had to develop analytical skills, writing skills, etc., we struggle to imagine the undeveloped brains of those who come of age in the zero-intellectual-gravity world of AI. OpenAI's study mode is at best a fig leaf.
edit: this comment was posted tongue-in-cheek after my comment reflecting my actual opinion was downvoted with no rebuttals:
https://news.ycombinator.com/item?id=44749957
I would say the modern digital world itself has already had the bigger impact on human thinking, at least at work.
It seems with computers we often think and reason far less than without. Everything required thought previously, now we can just copy and paste out word docs for everything. PowerPoints are how key decisions are communicated in most professional settings.
Before modern computers and especially the internet we also had more time for deep thinking and reasoning. The sublimity of deep thought in older books amazes me and it feels like modern authors are just slightly less deep on average.
So then LLMs are in my view an incremental change rather than a stepwise change with respect to its effects on human cognitive.
In some ways LLMs allow us to return a bit to more humanistic deep thinking. Instead of spending hours looking up minutia on Google, StackOverflow, etc now we can ask our favorite LLM instead. It gives us responses with far less noise.
Unlike with textbooks we can have dialogues and have it take different perspectives. Whereas textbooks only gave you that authors perspective.
Of course, it’s up to individuals to use it well and as a tool to sharpen thinking rather than replace it.
I'm not sure this is my experience so far. What I'm noticing is that my awesome developers have embraced AI as an accelerator, particularly for research and system design, small targeted coding activities with guardrails. My below average developers are having difficulty integrating AI at all in their workflow. If this trend continues the chasm between great and mediocre devs will widen dramatically.
In one sense it's a floor-lowerer, since it lowers the floor on how clueless you can be and still produce something loosely describably as software.
At least the last coding-with-AI chart is still too optimistic, I think. It doesn't reflect how AI coding tools are making developers less productive (instead of more) in non-trivial projects.
So speaking of "mastery":
I wanted to know how to clone a single folder in a Git repository. Having done this before, I knew that there was some incantation I needed to make to the git CLI to do it, but I couldn't remember what it was.
I'm very anti-AI for a number of reasons, but I've been trying to use it here and there to give it the benefit of the doubt and avoid becoming a _complete_ dinosaur. (I was very anti-vim ages ago when I learned emacs; I spent two weeks with vim and never looked back. I apply this philosophy to almost everything as a result.)
I asked Qwen3-235B (reasoning) via Kagi Assistant how I could do this. It gave me a long block of text back that told me to do the thing I didn't want it to do: mkdir a directory, clone into it, move the directory I wanted into the root of the directory, delete everything else.
When I asked it if it was possible to do this without creating the directory, it, incorrectly, told me that it was not. It used RAG-retrieved content in its chain of thought, for what that's worth.
It took me only 30 seconds or so to find the answer I wanted on StackOverflow. It was the second most popular answer in the thread. (git clone --filter=tree: --depth=0, then git sparse-checkout set --no-cone $FOLDER, found here: https://stackoverflow.com/a/52269934)
I nudged the Assistant a smidge more by asking it if there was a subcommand I could use instead. It, then, suggested "sparse-checkout init", which, according to the man page for this subcommand, is deprecated in favor of "set". (I went to the man page to understand what the "cone" method was and stumbled on that tidbit.)
THIS is the thing that disappoints me so much about LLMs being heralded as the next generation of search. Search engines give you many, many sources to guide you to the correct answer if you're willing to do the work. LLM services tell you what the "answer" is, even if it's wrong. You get potential misinformation back while also turning your brain off and learning less; a classic lose-lose.
ChatGPT, Gemini, and Claude all point me to a plethora of sources for most of my questions that I can click through and read. They're also pretty good at both basic and weird git issues for me. Not perfect, but pretty good.
Also part of the workflow of using AI is accepting that your initial prompts might not get the right answer. It's important to scan the answer like you did and use intuition to know 'this isn't right', then try again. Just like we learned how to type in good search queries, we'll also learn how to write good prompts. Sands will shift frequently at first, with prompt strategies that worked well yesterday requiring a different strategy tomorrow, but eventually it will stabilize like search query strategies did.
_I_ know that the answer provided by the prompt isn't quite right because I have enough experience with the Git CLI to know that it wasn't quite right.
Someone who doesn't use the Git CLI at all and is relying on an LLM to do it will not know that. There's also no reason for them to search beyond the LLM or use the LLM to go deeper because the answer is "good enough."
That's the point of what I'm trying to make. You don't know what you don't know.
Trying different paths that might go down dead ends is part of the learning process. LLMs short-circuit that. This is fine if you think that learning isn't valuable in, this case, software development. I think it is.
<soapbox>
More specifically, I think that this will, in the long term, create a pyramidal economy where engineers like you and I who learned "the old way" will reap most of the rewards while everyone else coming into the industry will fight for scraps.
I suppose this is fine if you think that this is just the natural order of things. I do not.
Tech is one of, if not the only, career path(s) that could give almost anyone a very high quality of life (at least in the US) without gatekeeping people behind the school they attend (i.e. being born into the right social strata, basically), many years of additional education and even more years of grinding like law, medicine and consulting do.
I'm very saddened to see this going away while us in the old guard cheer its destruction (because our jobs will probably be safe regardless).
</soapbox>
I also disagree with the claim that the LLM gives you a "plethora" of sources. The search I used gave me three [^0]. A search on the same topic gave me more than 15. [^1]
Yes, the 15 it gives me are all over the quality map, but I have much information at my disposal to find the answer I'm looking for. It also doesn't proport to be "the answer," like LLMs tend to do.
[^0] https://kagi.com/assistant/839b0239-e240-4fcb-b5da-c5f819a0f...
[^1] https://kagi.com/search?q=git+clone+single+folder
Really liked this article.
I wonder: the graphs treat learning with and without AI as two different paths. But obviously people can switch between learning methods or abandon one of them.
Then again, I wonder how many people go from learning about a topic using LLMs to then leaving them behind to continue the old school way. I think the early spoils of LLM usage could poison your motivation to engage with the topic on your own later on.
I learn about different subjects mixing traditional resources and AI.
I can watch a video about the subject, when I want to go deeper, I go to LLMs, throw a bunch of questions at it, because thanks to the videos I now know what to ask. Then the AI responses tell me what I need to understand deeper, so I pick a book that addresses those subjects. Then as I read the book and I don’t understand something, or I have some questions that I want the answer for immediately, I consult ChatGPT (or any other tool I want to try). At different points in the journey, I find something I could build myself to deepen my understanding. I google open source implementations, read them, ask LLMs again, I watch summary videos, and work my way through the problem.
LLMs serve as a “much better StackOverflow / Google”.
I use a similar approach. I tried to experiment going into a topic with no knowledge and it kinda fumbles, I highly recommend to have an overview.
But once you know basics, LLMs are really good to deepen the knowledge, but using only them is quite challenging. But as a complementary tool I find them excellent.
I think all of this is true, but the shape of the chart changes as AI gets better.
Think of how a similar chart for chess/go/starcraft-playing proficiency has changed over the years.
There will come a time when the hardest work is being done by AI. Will that be three years from now or thirty? We don't know yet, but it will come.
The second derivative of floor raiser is ceiling raising.
AI will be both a floor and a ceiling raiser, since there is a practical limit to how many domains one person or team can be expert in, and AI does/will have very strong levels of expertise/competency across a large number of domains and will thus offer significant level-ups in areas where cross-domain synthesis is crucial or where the limits of human working memory and pattern recognition make cross-domain synthesis unlikely to occur.
AI also enables much more efficient early stage idea validation, the point at which ideas/projects are the least anchored in established theory/technique. Thus AI will be a great aid in idea generation and early stage refinement, which is where most novel approaches stall or sit on a shelf as a hobby project because the progenitor doesn't have enough spare time to work through it.
Wouldn't it be both by this definition? It raises the bar for people who maybe have a lower IQ ("mastery"), but people who can us AI can then do more than ever before, raising the ceiling as well.
Wouldn't "more" in this house metaphor be like expanding the floor rather than raising the ceiling?
The blog still assumes that AI does not affect Mastery. I think it does.
All the AI junk like Agents in Service centers that you need to outplay in order to get in touch with a human, us as consumers are accepting this new status quo. We will accept products that sometimes can do crazy stuff because of hallucinations. Why? Ultimate capitalism consumerism sheepism, some other ism.
So AI (and whether it is correlarion or causation, i dont know) also corresponds with lower level of Mastery
Oh man i love this take. It's how I've been selling what I do when I speak with a specific segment of my audience: "My goal isn't to make the best realtors better, it's to make the worst realtors acceptable".
And my client is often the brokerage, they just want their agents to produce commissions so they make a cut. They know their top producers probably wont get much from what I offer, but we all see that their worst performers could easily double their business.
AI is not a floor raiser
It is a false confidence generator
It seems most suitable as an autonomous political speech writer and used car salesman coach.
Mixing this with a metaphor from earlier: giving a child a credit card is also a floor raiser.
People should be worried because right now AI is on an exponential growth trajectory and no-one knows when it will level off into an s-curve. AI is starting to get close to good enough. If it becomes twice as good in seven months then what?
What's the basis for your claim that it is on an exponential growth trajectory? That's not the way it feels to me as a fairly heavy user, it feels more like an asymptotic approach to expert human level performance where each new model gets a bit closer but is not yet reaching it, at least in areas where I am expert enough to judge. Improvements since the original ChatGPT don't feel exponential to me.
This also tracks with my experience. Of course, technical progress never looks smooth through the steep part of the s-curve, more a sequence of jagged stair-steps (each their own little s-curve in miniature). We might only be at the top of a stair. But my feeling is that we're exhausting the form-factor of LLMs. If something new and impressive comes along it'll be shaped different and fill a different niche.
People don't consider that there are real physical/thermodynamic constraints on intelligence. It's easy to imagine some skynet scenario, but all evidence suggests that it takes significant increases in energy consumption to increase intelligence.
Even in nature this is clear. Humans are a great example: cooked food predates homo sapiens and it is largely considered to be a pre-requisite for having human level intelligence because of the enormous energy demands of our brains. And nature has given us wildly more efficient brains in almost every possible way. The human brain runs on about 20 watts of power, my RTX uses 450 watts at full capacity.
The idea of "runaway" super intelligence has baked in some very extreme assumptions about the nature of thermodynamics and intelligence, that are largely just hand waved away.
On top of that, AI hasn't changed in a notable way for me personally in a year. The difference between 2022 and 2023 was wild, between 2023 and 2024 changed some of my workflows, 2024 to today largely is just more options around which tooling I used and how these tools can be combined, but nothing really at a fundamental level feels improved for me.
I was worried about that a couple of years ago, when there was a lot of hope that deeper reasoning skills and hallucination avoidance would simply arrive as emergent properties of a large enough model.
More recently, it seems like that's not the case. Larger models sometimes even hallucinate more [0]. I think the entire sector is suffering from a Dunning Kruger effect -- making an LLM is difficult, and they managed to get something incredible working in a much shorter timeframe than anyone really expected back in the early 2010s. But that led to overconfidence and hype, and I think there will be a much longer tail in terms of future improvements than the industry would like to admit.
Even the more advanced reasoning models will struggle to play a valid game of chess, much less win one, despite having plenty of chess games in their training data [1]. I think that, combined with the trouble of hallucinations, hints at where the limitations of the technology really are.
Hopefully LLMs will scare society into planning how to handle mass automation of thinking and logic, before a more powerful technology that can really do it arrives.
[0]: https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-m...
[1]: https://dev.to/maximsaplin/can-llms-play-chess-ive-tested-13...
really? I find newer models hallucinate less, and I think they have room for improvement, with better training.
I believe hallucinations are partly an artifact of imperfect model training, and thus can be ameliorated with better technique.
Yes, really!
Smaller models may hallucinate less: https://www.intel.com/content/www/us/en/developer/articles/t...
The RAG technique uses a smaller model and an external knowledge base that's queried based on the prompt. The technique allows small models to outperform far larger ones in terms of hallucinations, at the cost of performance. That is, to eliminate hallucinations, we should alter how the model works, not increase its scale: https://highlearningrate.substack.com/p/solving-hallucinatio....
Pruned models, with fewer parameters, generally have a lower hallucination risk: https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00695.... "Our analysis suggests that pruned models tend to generate summaries that have a greater lexical overlap with the source document, offering a possible explanation for the lower hallucination risk."
At the same time, all of this should be contrasted with the "Bitter Lesson" (https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...). IMO, making a larger LLMs does indeed produce a generally superior LLM. It produces more trained responses to a wider set of inputs. However, it does not change that it's an LLM, so fundamental traits of LLMs - like hallucinations - remain.
Let's look:
GPT-1 June 2018
GPT-2 February 2019
GPT-3 November 2021
GPT-4 March 2023
Claude tells me this is the rough improvement of each:
GPT-1 to 2: 5-10x
GPT-2 to 3: 10-20x
GPT 3 to 4: 2-4x
Now it's been 2.5 years since 4.
Are you expecting 5 to be 2-4x better, or 10-20x better?
How are you measuring this improvement factor? We have numerous benchmarks for LLMs and they are all saturating. We are rapidly approaching AGI by that measure, and headed towards ASI. They still won't be "human" but they will be able to do everything humans can, and more.
Only the first two mastery-time graphs make sense.
Only for the people already affluent enough to afford the ever-more expensive subscriptions. Those most in need of a floor-raising don’t have the disposable income to take a bet on AI.
It's very easy to sign up for an API account and pay per-call, or even nothing. Free offerings out there are great (Gemini, OpenRouter...) and a few are even suitable for agentic development.
And how long until they raise the prices?
API prices have moving in a downward direction, not upward.
At this point they are entirely backed by debt. Once the stream of free money drains though…
Either you are the item being sold or you are paying for the service.
Nothing is free, and I for one prefer a subscription model, if only as a change from the ad model.
I am sure we will see the worst of all worlds, but for now, for this moment in history, subscription is better than ads.
Let’s also never have ads in GenAi tools. The kind of invasive intent level influence these things can achieve, will make our current situation look like a paradise
I'd never buy anything as overt as an advertisement in an AI tool. I just want to buy influence. Just coincidentally use my product as the example. Just suggest my preferred technology when asked a few % more often than my competitors. I'd never want someone to observe me pulling the strings
Normally even if you pay you're still the product anyway. See buying smartphones for example… you pay a lot but you're the product.
It’s definitely about wage stagnation.
Wage suppression. The capital ownership class knows exactly how to weaponize the hype as an excuse now. It's also curious how software engineers working on it are cheerfully building their potential future replacements and long-term insecurity.
AI isn't a pit. AI is a ladder.
A ladder that doesn't reach the ceiling and sometimes ends up in imaginary universes.
Yeah, like in Nethack, while being blind and stepping on a cockatrice.
AI is chairs.
I feel like nobody remembers that facebook ad (Facebook is chairs), but it's seared into my own memory.
https://www.youtube.com/watch?v=SSzoDPptYNA&t=53s
I'd argue that AI reduces the distance between the floor and the ceiling, only both the floor and ceiling move -- the floor moves up, the ceiling downwards. Just using AI makes the floor move up, while over-reliance on it (a very personal metric) pushes the ceiling downwards.
Unlike the telephone (telephones excited a certain class of people into believing that world-wide enlightenment was on their doorstep), LLMs don't just reduce reliance on visual tells and mannerisms, they reduce reliance on thinking itself. And that's a very dangerous slope to go down on. What will happen to the next generation when their parents supply substandard socially-computed results of their mental work (aka language)? Culture will decay and societal norms will veer towards anti-civilizational trends. And that's exactly what we're witnessing these days. The things that were commonplace are now rare and sometimes mythic.
Everyone has the same number of hours and days and years. Some people master some difficult, arcane field while others while it away in front of the television. LLMs make it easier for the television-watchers to experience "entertainment nirvana" while enticing the smart, hard-workers to give up their toil and engage "just a little" rest, which due to the insidious nature of AI-based entertainment, meshes more readily with their more receptive minds.
AI is a wall raiser.
AI is a floor destroyer not a ceiling destroyer. Hang on for dear life!! :P
I guess the AI glazing has infiltrated everywhere
At the very least
I was thinking about this sentiment on my long car drive today.
it feels like when you need to paint walls in your house. If you've never done it before you'll probably reach for tape to make sure you don't ruin the ceiling and floors. the tape is a tool for amateur wall painters to get decent results somewhat efficiently compared to if they didn't. If your an actual good wall painter, tape only slows you down. You'll go faster without the "help".
You'll find many people lack the willpower and confidence to even get on the floor though. If it weren't for that they'd already know a programming language and be selling something.
AI is a shovel capable of breaking through the bottom of the barrel.
OP doesn't understand that almost everything is neither at the floor or the ceiling.
I mean it makes sense that if the AI is trained on human-created things it can never actually do better than that. Can't bust through the ceiling of what it was trained on. And at the same time, it AI gives that power to people that just aren't very smart or good at something.
[dead]
[flagged]
Imagine how useful it would be we could just add a button show our approval or disapproval of a piece of content without having to type true or false in the comment section. Let’s call it upvote or downvote button.
/s
I agree it wasn't a helpful comment.
On the other hand, I don't know what this mythical downvote button for stories is you describe. I've certainly never seen it.
Would be nice if HN actually had that.
It's what "flag" is for:
> Please don't complain that a submission is inappropriate. If a story is spam or off-topic, flag it. Don't feed egregious comments by replying; flag them instead. If you flag, please don't also comment that you did.
https://news.ycombinator.com/newsguidelines.html
This article is neither spam nor off topic.
I would agree, and have personally enjoyed the article. I just assumed that the person who wrote "false" might have considered it to be spam (perhaps in the broader sense), and if they did, flagging is considered to be the proper way of showing their disagreement.
No it's not.
There are a lot of stories that aren't worthy of flagging, but they're just low-quality.
And I've seen people abuse flagging because they treat it like downvoting. Flagging removes a story from the front page entirely, whereas ideally downvoting would simply deprioritize it, i.e. a bunch of downvotes would move it from #5 rank to #60 or something, not get rid of it entirely.
I've definitely had to e-mail the mods a number of times to restore a flagged story that was entirely appropriate, but which a few users simply didn't like. It would be much better to have a downvote button, and reserve flagging for actual spam and inappropriate content.
i have seen a lot of speculation about my "false" comment on this story. i think the story is bullshit and AI 100% it raises the ceiling and doesn't help incompetent people that much. you see it over and over where incompetent people use AI and submit legal docs with hallucinations or have egregious security holes in their vibe coded projects. but i have seen highly skilled folks get month long projects done in a day with higher quality, more tests and more features than they would in the past. explaining all that didn't seem necessary since the whole thing is just not true.