4 points | by prmph a day ago ago
2 comments
It's garbage.
This very own question is problematic. It creates the illusion that local LLM development can compete with huge datacenters, it cannot.
Well-Educated Human Brain > Data center LLMs > Local LLMs
That's how likely things are to be for a while.
>I like Claude Code for what it is, but I want an agentic coding setup that provides much stronger security and privacy guarantees.
Very fair. These big ai arent earning billions off $25/month subs that mostly lose them $.
>What is the state of the art right now regarding running local LLMS and connecting local agents to them?
Best of the best as far as Im aware is Qwen3 coder run in your choice of agentic coder. Up there with cloud strength in coding.
BUT it's 480B. Q4_K_M is 290 GB. You're talking $50,000, rackmount, 30amp electrical going into that beast. 10x 32GB cuda cards is yikes.
Here's what literally just released hours ago:
https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct
I'm currently downloading the unsloth IQ4_NL version which should run well.
This will run on a 24GB card with good context size and good speed. If you have 1x 32GB card even better.
What I've been using lately.
https://mistral.ai/news/devstral
On a 24GB card this is great. Just look at the benchmarks and it's absolutely completely usable.
Meant to be used in openhands. This absolutely is completely functioning once you get the settings correct like a low temperature and such.
It's garbage.
This very own question is problematic. It creates the illusion that local LLM development can compete with huge datacenters, it cannot.
Well-Educated Human Brain > Data center LLMs > Local LLMs
That's how likely things are to be for a while.
>I like Claude Code for what it is, but I want an agentic coding setup that provides much stronger security and privacy guarantees.
Very fair. These big ai arent earning billions off $25/month subs that mostly lose them $.
>What is the state of the art right now regarding running local LLMS and connecting local agents to them?
Best of the best as far as Im aware is Qwen3 coder run in your choice of agentic coder. Up there with cloud strength in coding.
BUT it's 480B. Q4_K_M is 290 GB. You're talking $50,000, rackmount, 30amp electrical going into that beast. 10x 32GB cuda cards is yikes.
Here's what literally just released hours ago:
https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct
I'm currently downloading the unsloth IQ4_NL version which should run well.
This will run on a 24GB card with good context size and good speed. If you have 1x 32GB card even better.
What I've been using lately.
https://mistral.ai/news/devstral
On a 24GB card this is great. Just look at the benchmarks and it's absolutely completely usable.
Meant to be used in openhands. This absolutely is completely functioning once you get the settings correct like a low temperature and such.