It is not a sponsored article and he writes one of these every time a new model releases. Why would a professor at Wharton need to write sponsored Substack articles.
> Again, it wasn’t perfect. As an expert, I was able to spot some errors and omissions (some as a result of the design I had asked for) that I had the AI correct
That's the bit that stuck out to me - that's longer than I would expect to work on a problem in a day or even expect to go back & fix the output of something that has a core reward loop of hours.
My customers are currently clamoring to push down my agent response times from 85 seconds down to below the 20s mark.
At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.
Work duration is also not that valuable of a measure, you're usually better off defining the process yourself in code and having that delegate chunks of work to the models. The only real issue there is that it's harder to take advantage of the providers' subscription discounts, but on the other hand it's easier to do your own model routing, and there's no way I've seen for the normal chatbots to maintain coherence on streams of work measured in days and weeks.
In Claude's defense (and I cannot believe I'm defending it), I know no single dev who could create what it did (Concord), from a 19-page design document, in 9.5 working hours.
We're gonna go back to the days where our bosses ask why we're just sitting around, but instead of saying "compiling," we'll just say, "waiting for Claude."
Instead of attacking the author, please respond to the content of the article. That is the HN way, and it leads to more substantive and interesting discussions.
Given the timing, this is very likely a submarine article. Or as the kids call it these days: sponcon.
https://www.paulgraham.com/submarine.html
It is not a sponsored article and he writes one of these every time a new model releases. Why would a professor at Wharton need to write sponsored Substack articles.
"I don't care who the IRS sends I am not paying taxes!"
> It worked for nine and a half hours.
> Again, it wasn’t perfect. As an expert, I was able to spot some errors and omissions (some as a result of the design I had asked for) that I had the AI correct
That's the bit that stuck out to me - that's longer than I would expect to work on a problem in a day or even expect to go back & fix the output of something that has a core reward loop of hours.
My customers are currently clamoring to push down my agent response times from 85 seconds down to below the 20s mark.
At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.
Work duration is also not that valuable of a measure, you're usually better off defining the process yourself in code and having that delegate chunks of work to the models. The only real issue there is that it's harder to take advantage of the providers' subscription discounts, but on the other hand it's easier to do your own model routing, and there's no way I've seen for the normal chatbots to maintain coherence on streams of work measured in days and weeks.
In Claude's defense (and I cannot believe I'm defending it), I know no single dev who could create what it did (Concord), from a 19-page design document, in 9.5 working hours.
We're gonna go back to the days where our bosses ask why we're just sitting around, but instead of saying "compiling," we'll just say, "waiting for Claude."
For the rare uninitiated:
https://xkcd.com/303/
My Opus 4.8 regularly works for 10+minutes on a single non-trivial coding request.
Your Opus 4.8? Is it now usual to refer to LLMs like that?
I just can't stand this type of fawning language.
Mollick runs the Generative AI Lab at Wharton, with all the corporate sponsors.
He is a professor but sadly also an AI shill. He should switch to advertising washing power.
So...no engagement with the substance? Not even to explain why it is that this is not a useful description or test of capabilities? Ok.
I would like to see it do something useful, like converting pytorch to golang.
Why not get a plan from Anthropic and get that done yourself? Probably is going to cost you as much as a coffee.
Hot damn - is that the floor of what you consider useful?
This newfangled car thing is useless. It can't even properly shoe a horse.
Instead of attacking the author, please respond to the content of the article. That is the HN way, and it leads to more substantive and interesting discussions.
would it be possible for mythos to make the space bar scroll the pages on your website properly?
I think Qwen 3.7-Plus is better at reasoning than Mythos, and I've used both for quite a while.
More Mythos Marketing.