I'm still on the fence about agent frameworks, they have their place, and it depends on the nature of the agent: e.g. "Low latency, return a good enough response in 3 seconds, vs. working for 3 hours on a problem."
BUT, if you boil it down, an agent really is context building, making an LLM call, executing requested tool calls, parsing the final model output, returning it to some frontend. There's extensions like memory, async tool calls, etc, but not THAT complicated from a traditional software engineering perspective.
Everyone seems to want to build their agent framework. But if you're tasked with building an agent, I've found it much easier and more maintainable to just build 1:1 code for THAT agent: most of the abstractions you get from an agent framework purely get in the way and obfuscate core agent logic.
You end up being forced to use the abstractions chosen by the agent framework, which sometimes are a mismatch for what you're actually trying to do.
I’ve said something similar about dozens of frontend frameworks. It’s massive abstraction and convolution for some future payoff that’s obviously never going to happen.
But sometimes people just need something to do, or something fun to play with, and “the next guy” rarely matters that much… so who cares that you’ve saddled them with the result of your paid playtime?
For me the heart of an agentic system is NOT using agents (except when you really have to). Components of a working system include:
- Pipelines/recipes to describe multi-step flows (deterministic, agentic and HiTL steps), loops, conditionals, exit-on's for max loop iteractions, etc
- The logistics to actually run the model and HiTL steps reliably across multiple agent worker pools
- Management and delivery (and security/governance and permissioning) of thick skills with code to do as much as possible
- Context management so the right agents have the right context for the right sessions at the right time
- Project management - ability to store and access tickets, dependencies, track progress, restart stuck ticket claims, etc
- Transcript saving, memory features and dreaming/compounding capabilities so the agents continue to learn from each session
- o11y for understanding whats happening, tracking costs and usages, etc
- Evals and auto-tuning of prompts so you can go cross model provider and also lock to a model version so you can do an ROI on each model version upgrade
- Sandboxes for running the actual model sessions
Don't need to get it all from one vendor, but that feels to me like the toolkit and for most use cases I'd argue:
- Don't limit yourself to a single model provider (anthropic, openai, etc)
- Own your context
- Own your compounding
> Context management so the right agents have the right context for the right sessions at the right time
I'm going to do a show HN tomorrow that explains how you can give your agents years of experience. The basic idea is, you would commit in your repo or download manifests (JSON files) that can be converted to "Brains" (SQLite databases). Each brain can have its own properties.
For example, I provide a "code intent" analyzer (instructions for AI) that says when analyzing a file, extract this metadata. For the code intent analyzer, I have the AI extract a single sentence purpose for the file. So if you execute:
gsc rg cache --db code-intent --fields purpose
you get all matches for 'cache' plus the matching file's purpose like "Modify file to update caching strategy". This is how the agent can tell if the file is talking about cache vs. whether this file is what you should change if you want to update the caching strategy.
So for what you described, you can have a brain for different stages of a task. It can be as simple as, in the planning stage, make sure you do this if you need to touch this file.
I am working on a rust-blast-radius brain that uses `syn` + AI generated metadata to help you understand "what if I changed this file, what would be affected". With the rust-blast-radius brain, the AI can summarize the types of files that will be affected without having to open the file based on what has been changed or discussed.
So you can have a rule like, if I make changes to a Rust file, make sure to do a blast radius analysis so we don't forget to consider something.
(I'm not the guy but) That's funny, I had the same idea the other day. Keeping summaries of files. Haven't tested that yet.
Another thing I've been thinking is how, most parts of a file are not relevant to the whole system.
Like there are parts where they intersect, and those seem to be the most important ones for capturing the big picture. You wanna be able to see the entire "skeleton".
So I thought the summary maybe shouldn't be English but it should be a subset of the code — the subset that's relevant to the rest of the program.
In your chat with AI, include the above file and let it know what your requirements are and I can create the analyzer and include it.
You can also think of my tool as data prepping tool. So if you have a clear prompt the AI can review the file during analysis and remove all unnecessary code so the extracted metadata will the stripped text which you can use search against.
Obscuring core logic is the most egregious part of most agent frameworks. One needs a clear view of what, exactly, is being sent to the underlying language model, and what's coming back. Everything in an 'agentic' application is realized as a sequence of tokens or a call to a provider eventually. It should be clear and obvious from ~all layers of the app what that's going to look like.
Most framework vendors don’t have an incentive to make things less obscure. The agent framework is free/open source and they make money primarily from selling observability products for agents. Even if they don’t intentionally obscure things, they just don’t have the motivation to optimize that part.
Pi is a nice multi-agent wrapper. I use it to wrap my OpenAI max plan calls and my API calls. It takes care of some of the agent plumbing - still need sandbox, orchestrator, compounding, context, evals, etc but it's a nice component.
Any particular plugins you'd recommend? For orchestration I gave pi-subagents a try, and didn't care for it, ended up with hung agents sticking around forever and I wasn't even doing anything terribly fancy. Claude's subagent control is annoying and clumsy, but it works.
The advantage of frameworks isn't that they make it easier to write the actual agent, it's tooling + observability + ...
Even Langchain, for all the (deserved) criticism it gets made this very clear very early: It might be easy/easier to write your own chatbot from the ground up, but what happens if you have to add observability/tracing? Being able to just add one environment variable and instantly have a UI where i can nicely go through all of my traces with basically 0 additional effort is something a hand rolled solution just can't really compete with
This only becomes relevant if your execution graph is complex/big enough. Otherwise, all it takes is less than 30 minutes to add telemetry to all needed points. Doing manually also gives you better control on what you really want to track (to save costs).
Yep.. same. I build my own agents... all use-case specific. Keeps the code super minimal, and avoid unnecessary complexity. I have tried a few of these, but nop.. no help.. only more work (and issues).
I'm somewhat in agreement. I like building 1:1 code for that specific agent.
Where I'm starting to question this is maintainability. When I come up with a new technique or way of doing something in my new agent, how can I update an older agent. Do I want to update the older agent?
But, I get what you're talking about w.r.t. building for the exact problem at hand. For example, I'm guessing that Apache Burr has support for a plugin-able vector RAG system (or at least it will if it doesn't now). That's great, but I want my RAG system to add documents to the context and keep them as part of an updated system prompt with some very specific tweaks that happen as part of that process. This is a bespoke way of working with an existing concept (RAG) that doesn't lend itself to using any specific framework.
In my use-case, bespoke is the way to go. But then I'm still stuck with having to make engineering choices for updating older agents. So, I see your point.
I like to think of it as "AI prompting algorithms". Like instead of just this prompt gets this result it's A prompt then B prompt the C prompt gets a result.
And just like when people were trying to figure out which sorting algorithm made the most sense, we are all just trying to figure out which prompt algorithms with which models lead to good results.
The most interesting evolution of my agent workflow over the past year is that I've dropped all the language gimmicks: I used to use particular emoji to mark parts of code as hints to the AI, I structured planning docs in very rigid language for different types of instructions, and generally optimizing my language for machine consumption. That's all gone now: all my comments, directives, plans and so on are just plain and clear English, nothing more. They're direct and unceremonious, but no longer bullet points that were nearly caveman-speak. It just works better that way.
Right I think this is why we made it unopinioated to a fault. Burr doesn't really do these things rather it just provides an orchestration framework. So it's pure BYO functions, classes, components, etc...
Couldn't agree more - tried to convince a business that doubling down on OpenClaw wasn't going to solve problems except for some 0-1 stuff, and that almost immediately they'd run into roadblocks because most of the product wouldn't serve their use case.
4 months of mostly spinning their wheels later they launched a really lackluster OC product that's effectively DOA.
OpenClaw is an application, not a harness. Yes, it contains a harness, but it is a complete product.
When building an agentic workflow there are enough primitives that rewriting them from scratch every time makes zero sense.
What is a tool? How does the LLM understand the tool? Formatting a native function into a serializable input/output pattern makes sense to generalize and that does not need to exist repeated in everyones application code.
We use libraries to interact with the APIs themselves; nobody would say writing a spec-compliant API client was poor practice. Agentic harnesses are just one layer above: I need to call the API and I need to do it with certain expected conventions.
One, obviously yes OC contains a lot more than a harness, but my point was that it was too much for their use case and constrained their choices, not enabled them, and that choosing the right layer of abstraction is important.
There's good indirection/abstraction and there's ones that do not serve your use case, eg what was obviously day one regarding Langchain.
Burr just helps you, the engineer, to really control the primitives. Then adds some cool features you don't have to think about -- like observability :)
100% agreed, the "this is what an agent looks like to write" is the wrong pitch for a new agent framework.
The better pitch would be, "this is how easy observability, guardrails, monitoring, deployment, evals, versioning, A/B testing are with our framework." What the agent code looks like is somewhat incidental.
Anyone have something they genuinely like for all of this? For now I'm rolling my own, but I can't believe I won't find a better OSS alternative soon...
Nvidia Openshell solves most of the hard problems I've run into while building stuff in this space.
Observability is, for my purposes, solved by a given framework supporting OpenTelemetry.
Guardrails is where I've gotten the most value of openshell being a neat package. Agent workload scope is written as policy in openshell, and capability is backed by openshell handling all execution.
Monitoring/deployment/versioning is helped as well, depending on how agents/runners are slotted into the system. Deployment namely is quite well supported- openshell has kube/helm bits that are experimental atm, but seem like a logical approach imho.
Evals and a/b testing isnt something ive explored in depth, considering that agents with composable tool sets + frontier models are beyond my expectations already.
The agent isn't the hard part - it's the orchestration, skills, research systems, adversarial reviews, dreaming/compounding, context management and all the rest. Plus all the annoying hygiene tools to "poke the agent that got a clear prompt and decided to just sit there and wait for no good reason" and "delete the remote branches that the prompt told all the agents to delete but some of them forgot to":)
Take a simple workflow. You have a query it goes to a classifier. The classifier determines what workflow it should route the request to.
Then you have a general workflow that has a set of skills (prompts) and tools. And that could be recursive.
So if you do something like "rename this file" you have to build up a workflow like:
[classifier]
what's the workflow -> rename
[rename workflow]
list files (tool call)
figure out relevant predicate (LLM)
convert predicate into a filter query give the context of the files (LLM)
figure out what you want the new name to be (LLM)
create the request body and hit the tool
approval workflow
formatting
It's a lot to manage and orchestrate and that's just one simple example. You'd like want to use the same building blocks to delete a file or move it. Even to know the right concepts is difficult as we're a bit deluded on whats going on in the background of these modern AI apps like Claude and GPT that do a lot of this stuff for you
Agents are a way to de-bloat the context. The way LLMs function, you absolutely need to find the sweet spot for a given task, and if the primary LLM has to go through a bunch of failures to find a working function, those failures are better contained in an agent and disposed of.
Obviously, you could have a different LLM like a "angel" that prunes a primary agent of the context it doesn't need, but I think the realistic KV cache problem is will determine the optimal structure: you want the work do be done in the most efficience KV cache (context-reuse) as much as possible.
There's definitely more to it than just spawning agents.
Yes, Python has decorators, but they're best used as "filters" that apply to functions or methods. Cache this, serialize the output of this function always, prepare this function to be used as a tool by an agentic harness. Not registration, not flow control. You may disagree but someone has to say it; FastAPI influenced the modern use of decorators far too much in the wrong direction.
Builder patterns are a Rust convention, because Rust has no named keyword arguments. A Python function already exposes a named contract. There is very little reason to ever to sequentially pass configuration parameters in chained method calls. If you need to add state that doesn't exist yet to a constructor or factory, that is not a builder pattern. That is registration. The one place where builder patterns should be tolerated is query builders. They iteratively build on a concept and having the additional "slot" for metadata (method name plus keyword arguments) is genuinely useful. Using methods which accept single parameter instead of keyword arguments is incorrect.
So decorators here specifically attach metadata to make a function a reusable component. Builder makes a workflow. In Hamilton it's all decorators because it's purely declarative construction (sans reusability, really).
Doesn't look any different than doing the same in C# or Java to me, it is kind of pointless in Python, the one thing the pattern gives you is building a class in such a way that you the developer know exactly what's what, so its really a developer ergonomics thing is how it looks like to me.
How does this compare to https://strandsagents.com/ ? I'm interested in tools in this space, right now I'm not attached to one, but Bedrock + Serverless on Agent Core feels like the "easy guided path" though I don't like the platform lock-in
I’ve been playing with this stack and left wondering if Strands provides any secret sauce with Agent Core. So far it doesn’t feel that way and sometimes they even feel at odds with each other.
On a tangent, can anyone recommend good coding agent orchestration tools or platform? Something to launch, manage and monitor codex or claude agents in multiple machines
Ideally self-hostable/open source
I know claude code has a lot of that internally built in already, but it’s claude-only
I was looking at their docs and Burr has agent cookbooks to get started with this, and it can handle multi-machine workflows. Is this not what you were looking for? I am not sure how it integrates and uses skills etc, but it seems like it should work to me.
Why wouldn't it? The ASF has a long history of incubating new FOSS projects. Some graduate and become household names. Others fail and end up in the attic. The ASF can provide organisational support and generally fosters good communities.
My point was this is a crowded market now, why would they pick a platform that is not known? I did search HN and this platform was only shown once 2 years ago, and from their releases, they are still 0.42 after two years.
It might sounded that I’m against the move, but I’m just curious as what apache found in the platform to get incubated
Cause I submitted it. Learning the Apache process and cranking on other things has been a slow process. But we've got some momentum and beginning more regular releases.
Burr is named after Aaron Burr, founding father, third VP of the United States, and murderer/arch-nemesis of Alexander Hamilton. What's the connection with Hamilton? This is DAGWorks' second open-source library release after the Hamilton library We imagine a world in which Burr and Hamilton lived in harmony and saw through their differences to better the union. We originally built Burr as a harness to handle state between executions of Hamilton DAGs (because DAGs don't have cycles), but realized that it has a wide array of applications and decided to release it more broadly.
Right it was a bit of a joke. Originally stefan and I presented frameworks when we were at stitch fix -- stefan called his "hamilton" and I called mine "burr". His was better for the use-case. But then we wanted to build something for state machines as opposed to DAGs, so we called it Burr. I wanted the git tagline to be "make your agents go burr..."
I think the marketing copy probably needs to focus on differentiating features vs any myriad of agent frameworks. I took one look at the sample and immediately said "This is literally just langgraph with a builder pattern"
The best agent framework is Pi (pi.dev). It is minimal and doesn't assume a use case, runs fine interactively or non-interactively, has an active community building with it and supports everything you need to build whatever kind of agent you want with plugins.
I searched the docs for authentication and mcp (one of the protocols which, among other things, handles some pieces of authentication/authorization) but didn't see any results.
I just create a MVP chatbot for a client that has a Django app. I took the route to no frameworks. Claude/codex wrote the agent loop, the tools, the streaming..it’s working well for the MVP, we’ll see
I'm still on the fence about agent frameworks, they have their place, and it depends on the nature of the agent: e.g. "Low latency, return a good enough response in 3 seconds, vs. working for 3 hours on a problem."
BUT, if you boil it down, an agent really is context building, making an LLM call, executing requested tool calls, parsing the final model output, returning it to some frontend. There's extensions like memory, async tool calls, etc, but not THAT complicated from a traditional software engineering perspective.
Everyone seems to want to build their agent framework. But if you're tasked with building an agent, I've found it much easier and more maintainable to just build 1:1 code for THAT agent: most of the abstractions you get from an agent framework purely get in the way and obfuscate core agent logic.
You end up being forced to use the abstractions chosen by the agent framework, which sometimes are a mismatch for what you're actually trying to do.
I’ve said something similar about dozens of frontend frameworks. It’s massive abstraction and convolution for some future payoff that’s obviously never going to happen.
But sometimes people just need something to do, or something fun to play with, and “the next guy” rarely matters that much… so who cares that you’ve saddled them with the result of your paid playtime?
For me the heart of an agentic system is NOT using agents (except when you really have to). Components of a working system include: - Pipelines/recipes to describe multi-step flows (deterministic, agentic and HiTL steps), loops, conditionals, exit-on's for max loop iteractions, etc - The logistics to actually run the model and HiTL steps reliably across multiple agent worker pools - Management and delivery (and security/governance and permissioning) of thick skills with code to do as much as possible - Context management so the right agents have the right context for the right sessions at the right time - Project management - ability to store and access tickets, dependencies, track progress, restart stuck ticket claims, etc - Transcript saving, memory features and dreaming/compounding capabilities so the agents continue to learn from each session - o11y for understanding whats happening, tracking costs and usages, etc - Evals and auto-tuning of prompts so you can go cross model provider and also lock to a model version so you can do an ROI on each model version upgrade - Sandboxes for running the actual model sessions
Don't need to get it all from one vendor, but that feels to me like the toolkit and for most use cases I'd argue: - Don't limit yourself to a single model provider (anthropic, openai, etc) - Own your context - Own your compounding
Can you comment more on
> Context management so the right agents have the right context for the right sessions at the right time
I'm going to do a show HN tomorrow that explains how you can give your agents years of experience. The basic idea is, you would commit in your repo or download manifests (JSON files) that can be converted to "Brains" (SQLite databases). Each brain can have its own properties.
For example, I provide a "code intent" analyzer (instructions for AI) that says when analyzing a file, extract this metadata. For the code intent analyzer, I have the AI extract a single sentence purpose for the file. So if you execute:
gsc rg cache --db code-intent --fields purpose
you get all matches for 'cache' plus the matching file's purpose like "Modify file to update caching strategy". This is how the agent can tell if the file is talking about cache vs. whether this file is what you should change if you want to update the caching strategy.
So for what you described, you can have a brain for different stages of a task. It can be as simple as, in the planning stage, make sure you do this if you need to touch this file.
I am working on a rust-blast-radius brain that uses `syn` + AI generated metadata to help you understand "what if I changed this file, what would be affected". With the rust-blast-radius brain, the AI can summarize the types of files that will be affected without having to open the file based on what has been changed or discussed.
So you can have a rule like, if I make changes to a Rust file, make sure to do a blast radius analysis so we don't forget to consider something.
Does this align with what you are looking for?
(I'm not the guy but) That's funny, I had the same idea the other day. Keeping summaries of files. Haven't tested that yet.
Another thing I've been thinking is how, most parts of a file are not relevant to the whole system.
Like there are parts where they intersect, and those seem to be the most important ones for capturing the big picture. You wanna be able to see the entire "skeleton".
So I thought the summary maybe shouldn't be English but it should be a subset of the code — the subset that's relevant to the rest of the program.
`grep import` gets you 90% of the way there.
If you include the following:
https://github.com/gitsense/chat/blob/main/base-state/analyz...
In your chat with AI, include the above file and let it know what your requirements are and I can create the analyzer and include it.
You can also think of my tool as data prepping tool. So if you have a clear prompt the AI can review the file during analysis and remove all unnecessary code so the extracted metadata will the stripped text which you can use search against.
Obscuring core logic is the most egregious part of most agent frameworks. One needs a clear view of what, exactly, is being sent to the underlying language model, and what's coming back. Everything in an 'agentic' application is realized as a sequence of tokens or a call to a provider eventually. It should be clear and obvious from ~all layers of the app what that's going to look like.
Most framework vendors don’t have an incentive to make things less obscure. The agent framework is free/open source and they make money primarily from selling observability products for agents. Even if they don’t intentionally obscure things, they just don’t have the motivation to optimize that part.
Have a look at pi.
Whst about pi do you like?
Pi is a nice multi-agent wrapper. I use it to wrap my OpenAI max plan calls and my API calls. It takes care of some of the agent plumbing - still need sandbox, orchestrator, compounding, context, evals, etc but it's a nice component.
Any particular plugins you'd recommend? For orchestration I gave pi-subagents a try, and didn't care for it, ended up with hung agents sticking around forever and I wasn't even doing anything terribly fancy. Claude's subagent control is annoying and clumsy, but it works.
The advantage of frameworks isn't that they make it easier to write the actual agent, it's tooling + observability + ... Even Langchain, for all the (deserved) criticism it gets made this very clear very early: It might be easy/easier to write your own chatbot from the ground up, but what happens if you have to add observability/tracing? Being able to just add one environment variable and instantly have a UI where i can nicely go through all of my traces with basically 0 additional effort is something a hand rolled solution just can't really compete with
This only becomes relevant if your execution graph is complex/big enough. Otherwise, all it takes is less than 30 minutes to add telemetry to all needed points. Doing manually also gives you better control on what you really want to track (to save costs).
Yep.. same. I build my own agents... all use-case specific. Keeps the code super minimal, and avoid unnecessary complexity. I have tried a few of these, but nop.. no help.. only more work (and issues).
I'm somewhat in agreement. I like building 1:1 code for that specific agent.
Where I'm starting to question this is maintainability. When I come up with a new technique or way of doing something in my new agent, how can I update an older agent. Do I want to update the older agent?
But, I get what you're talking about w.r.t. building for the exact problem at hand. For example, I'm guessing that Apache Burr has support for a plugin-able vector RAG system (or at least it will if it doesn't now). That's great, but I want my RAG system to add documents to the context and keep them as part of an updated system prompt with some very specific tweaks that happen as part of that process. This is a bespoke way of working with an existing concept (RAG) that doesn't lend itself to using any specific framework.
In my use-case, bespoke is the way to go. But then I'm still stuck with having to make engineering choices for updating older agents. So, I see your point.
Anthropic seems to agree with you as more recent Claude updates have it just building task specific harnesses as needed.
I like to think of it as "AI prompting algorithms". Like instead of just this prompt gets this result it's A prompt then B prompt the C prompt gets a result.
And just like when people were trying to figure out which sorting algorithm made the most sense, we are all just trying to figure out which prompt algorithms with which models lead to good results.
The most interesting evolution of my agent workflow over the past year is that I've dropped all the language gimmicks: I used to use particular emoji to mark parts of code as hints to the AI, I structured planning docs in very rigid language for different types of instructions, and generally optimizing my language for machine consumption. That's all gone now: all my comments, directives, plans and so on are just plain and clear English, nothing more. They're direct and unceremonious, but no longer bullet points that were nearly caveman-speak. It just works better that way.
Right I think this is why we made it unopinioated to a fault. Burr doesn't really do these things rather it just provides an orchestration framework. So it's pure BYO functions, classes, components, etc...
Couldn't agree more - tried to convince a business that doubling down on OpenClaw wasn't going to solve problems except for some 0-1 stuff, and that almost immediately they'd run into roadblocks because most of the product wouldn't serve their use case.
4 months of mostly spinning their wheels later they launched a really lackluster OC product that's effectively DOA.
OpenClaw is an application, not a harness. Yes, it contains a harness, but it is a complete product.
When building an agentic workflow there are enough primitives that rewriting them from scratch every time makes zero sense.
What is a tool? How does the LLM understand the tool? Formatting a native function into a serializable input/output pattern makes sense to generalize and that does not need to exist repeated in everyones application code.
We use libraries to interact with the APIs themselves; nobody would say writing a spec-compliant API client was poor practice. Agentic harnesses are just one layer above: I need to call the API and I need to do it with certain expected conventions.
Sure, but that seems liked two different points.
One, obviously yes OC contains a lot more than a harness, but my point was that it was too much for their use case and constrained their choices, not enabled them, and that choosing the right layer of abstraction is important.
There's good indirection/abstraction and there's ones that do not serve your use case, eg what was obviously day one regarding Langchain.
yep - here's something cool an end user wrote with Burr - https://github.com/msradam/phoebe
Burr just helps you, the engineer, to really control the primitives. Then adds some cool features you don't have to think about -- like observability :)
my job rn is just building agents
the hard part about building agents isnt the framework it's discovery, context, traditional engineering, handling the last mile
there are some invariants like the loop, tools, observability, guardrails, monitors etc...
100% agreed, the "this is what an agent looks like to write" is the wrong pitch for a new agent framework.
The better pitch would be, "this is how easy observability, guardrails, monitoring, deployment, evals, versioning, A/B testing are with our framework." What the agent code looks like is somewhat incidental.
This this this!
Anyone have something they genuinely like for all of this? For now I'm rolling my own, but I can't believe I won't find a better OSS alternative soon...
Nvidia Openshell solves most of the hard problems I've run into while building stuff in this space.
Observability is, for my purposes, solved by a given framework supporting OpenTelemetry.
Guardrails is where I've gotten the most value of openshell being a neat package. Agent workload scope is written as policy in openshell, and capability is backed by openshell handling all execution.
Monitoring/deployment/versioning is helped as well, depending on how agents/runners are slotted into the system. Deployment namely is quite well supported- openshell has kube/helm bits that are experimental atm, but seem like a logical approach imho.
Evals and a/b testing isnt something ive explored in depth, considering that agents with composable tool sets + frontier models are beyond my expectations already.
We need the equivalent of the MEAN / LAMP stackronym for agents.
It’s painfully obvious that you can just open your coding harness and… tell it you’d like to make an agent. They’re simple to write.
The agent isn't the hard part - it's the orchestration, skills, research systems, adversarial reviews, dreaming/compounding, context management and all the rest. Plus all the annoying hygiene tools to "poke the agent that got a clear prompt and decided to just sit there and wait for no good reason" and "delete the remote branches that the prompt told all the agents to delete but some of them forgot to":)
This is where it's nice to have some guardrails -- coding agents work etremely well with limitations.
Take a simple workflow. You have a query it goes to a classifier. The classifier determines what workflow it should route the request to.
Then you have a general workflow that has a set of skills (prompts) and tools. And that could be recursive.
So if you do something like "rename this file" you have to build up a workflow like:
[classifier]
what's the workflow -> rename
[rename workflow]
list files (tool call)
figure out relevant predicate (LLM)
convert predicate into a filter query give the context of the files (LLM)
figure out what you want the new name to be (LLM)
create the request body and hit the tool
approval workflow
formatting
It's a lot to manage and orchestrate and that's just one simple example. You'd like want to use the same building blocks to delete a file or move it. Even to know the right concepts is difficult as we're a bit deluded on whats going on in the background of these modern AI apps like Claude and GPT that do a lot of this stuff for you
100% this
you dont need a framework
Agents are a way to de-bloat the context. The way LLMs function, you absolutely need to find the sweet spot for a given task, and if the primary LLM has to go through a bunch of failures to find a working function, those failures are better contained in an agent and disposed of.
Obviously, you could have a different LLM like a "angel" that prunes a primary agent of the context it doesn't need, but I think the realistic KV cache problem is will determine the optimal structure: you want the work do be done in the most efficience KV cache (context-reuse) as much as possible.
There's definitely more to it than just spawning agents.
A builder pattern and decorators.
Yes, Python has decorators, but they're best used as "filters" that apply to functions or methods. Cache this, serialize the output of this function always, prepare this function to be used as a tool by an agentic harness. Not registration, not flow control. You may disagree but someone has to say it; FastAPI influenced the modern use of decorators far too much in the wrong direction.
Builder patterns are a Rust convention, because Rust has no named keyword arguments. A Python function already exposes a named contract. There is very little reason to ever to sequentially pass configuration parameters in chained method calls. If you need to add state that doesn't exist yet to a constructor or factory, that is not a builder pattern. That is registration. The one place where builder patterns should be tolerated is query builders. They iteratively build on a concept and having the additional "slot" for metadata (method name plus keyword arguments) is genuinely useful. Using methods which accept single parameter instead of keyword arguments is incorrect.
So decorators here specifically attach metadata to make a function a reusable component. Builder makes a workflow. In Hamilton it's all decorators because it's purely declarative construction (sans reusability, really).
Builder pattern isn't only used in Rust, but I agree it's hideous to use in Python.
Doesn't look any different than doing the same in C# or Java to me, it is kind of pointless in Python, the one thing the pattern gives you is building a class in such a way that you the developer know exactly what's what, so its really a developer ergonomics thing is how it looks like to me.
Fair point. I should have said "popularized in the modern software vernacular by Rust".
I think of Java immediately when I hear Builder Pattern, and I think anyone who has ever touched Java does as well.
How does this compare to https://strandsagents.com/ ? I'm interested in tools in this space, right now I'm not attached to one, but Bedrock + Serverless on Agent Core feels like the "easy guided path" though I don't like the platform lock-in
Curious about other experiences.
I’ve been playing with this stack and left wondering if Strands provides any secret sauce with Agent Core. So far it doesn’t feel that way and sometimes they even feel at odds with each other.
I've been working with jido https://jido.run and would definitely recommend it
On a tangent, can anyone recommend good coding agent orchestration tools or platform? Something to launch, manage and monitor codex or claude agents in multiple machines
Ideally self-hostable/open source
I know claude code has a lot of that internally built in already, but it’s claude-only
I was looking at their docs and Burr has agent cookbooks to get started with this, and it can handle multi-machine workflows. Is this not what you were looking for? I am not sure how it integrates and uses skills etc, but it seems like it should work to me.
https://burr.apache.org/docs/examples/agents/
First time I hear about Burr, curious why it was incubated in Apache.
Why wouldn't it? The ASF has a long history of incubating new FOSS projects. Some graduate and become household names. Others fail and end up in the attic. The ASF can provide organisational support and generally fosters good communities.
My point was this is a crowded market now, why would they pick a platform that is not known? I did search HN and this platform was only shown once 2 years ago, and from their releases, they are still 0.42 after two years.
It might sounded that I’m against the move, but I’m just curious as what apache found in the platform to get incubated
Cause I submitted it. Learning the Apache process and cranking on other things has been a slow process. But we've got some momentum and beginning more regular releases.
I couldn't find an explicit reference for the naming, but for anyone wondering there is a Hamilton example: https://github.com/apache/burr/tree/main/examples/multi-agen...
Burr is named after Aaron Burr, founding father, third VP of the United States, and murderer/arch-nemesis of Alexander Hamilton. What's the connection with Hamilton? This is DAGWorks' second open-source library release after the Hamilton library We imagine a world in which Burr and Hamilton lived in harmony and saw through their differences to better the union. We originally built Burr as a harness to handle state between executions of Hamilton DAGs (because DAGs don't have cycles), but realized that it has a wide array of applications and decided to release it more broadly.
https://pypi.org/project/burr/
Right it was a bit of a joke. Originally stefan and I presented frameworks when we were at stitch fix -- stefan called his "hamilton" and I called mine "burr". His was better for the use-case. But then we wanted to build something for state machines as opposed to DAGs, so we called it Burr. I wanted the git tagline to be "make your agents go burr..."
I think the marketing copy probably needs to focus on differentiating features vs any myriad of agent frameworks. I took one look at the sample and immediately said "This is literally just langgraph with a builder pattern"
Right -- possible it's slightly out of date https://github.com/apache/burr#-comparison-against-common-fr.... Good point on differentiating.
Wow, such a un-apache-y homepage I've ever seen, vs. the canonical one: https://httpd.apache.org/ (And wow, they still keep releasing it!)
Ha! We went all out on the modern one (user contribution!).
Here's another ancient relic of a website design. They even used a free template from back in the day.
https://tcl.apache.org/rivet/
The best agent framework is Pi (pi.dev). It is minimal and doesn't assume a use case, runs fine interactively or non-interactively, has an active community building with it and supports everything you need to build whatever kind of agent you want with plugins.
After trying a few I like NanoBot more than Pi. Also popular, pretty clean code, I found fewer bugs than I did in Pi.
How are agents authenticated?
I searched the docs for authentication and mcp (one of the protocols which, among other things, handles some pieces of authentication/authorization) but didn't see any results.
What did I miss?
Is this comparable to https://dspy.ai/ ?
no. more lower level.
I just create a MVP chatbot for a client that has a Django app. I took the route to no frameworks. Claude/codex wrote the agent loop, the tools, the streaming..it’s working well for the MVP, we’ll see
vibe coded landing page
reddit user testimonial
framework is for state machines
why man..
Don't ask the why, ask the how. How did they get acceptance into an incubation stage with what you just mentioned?
How does one know that a website is "vibe coded"? Any good indicators?
https://vorpus.github.io/performativeUI/
so far I'm seeing: GradientText, Animated button, EyebrowPill, Aurora background, MockIDE, LogoRow, SlippyWords, StatCounter, CommunityBadge
also: "No DSL, no YAML — just Python functions and decorators."
'It's not X, its Y' but with an added em dash is crazy work.
The flair is a big give away. View source. Look for the SVGs.
The vibe coded landing page (at least in its look) is really degrading Apache foundation image imo.
user contributed :)
One of the co-creators/maintainers here! Will try to answer Qs over the day.
Claude Opus really loves this template when building websites. It's very funny how many times I've seen it for recent launches.
And it lags my desktop every-time, I hate it. It's the default bootstrap theme all over again but instead with SVG's.