This is super cool… It's interesting to see it build. I can't tell if the agent will run indefinitely, but it's been going for 7 or 8 minutes now, constantly tweaking its composition.
I find this approach to be more appealing than AI models that generate fully baked songs as waveforms. Give me something I can open in Logic and keep tweaking…
Very neat. I built an agentic web DAW last year. Models were pretty crap at producing anything good, but that's changing rapidly.
I forked anthropic's MCP at the time to use it in the browser, but it was just too much trouble and I wanted to wait for something like WebMCP to appear before fiddling with it more.
Planning on dusting off the DAW and releasing it very soon.
After playing around all day over the past few days, I've come to the conclusion that the system prompt goes a huge way! We've incorporated music theory and many other instructions in the system prompt for it to be able to come up with something like this.
With that, definitely looking forward to models producing good music the analytical way and not pattern finding as we see in specialized audio/music-gen models.
Yep, I totally agree that context engineering is everything here, but the jump in model quality just in the last 4 months has just been insane. They are just way better at this now.
In the case of my DAW, I went even fundamental and created a node-based visual UI and gave the agent the ability to program new modules using the Web Audio API, and to choose from selection of stock instruments and effects. Modules are editable after instantiation, and automatically create UI for each module based on the parameters, input and output. It could spawn and wire things up, do sound design, that sort of thing.
I also have recently tried out Gemini 3.1 Pro out on audio, and you should give it a spin if you haven't yet. It actually is the first model I've seen really able to talk about music in terms of frequency and time with great accuracy. It can break down songs by instrumentation, composition, sound design, arrangement, etc.
Its philosophical take on the music itself isn't always great, but it does have precision and at a high level you can see where things are headed. Some of its advice was definitely valid and actionable. I want to plug it into my DAW or Ableton MCP and see what happens. It might actually be able to do real sound design. What I want to do is not just ask for a melody, but be able to say things like, "let's throw a Reese base in there" or "sidechain everything under the kick" and for the model to know what I'm talking about. So not just music theory, etc. but sound design as well.
I'd love to chat about this more somewhere and cross-pollinate ideas if you're up for it, email's in my bio.
This is super cool… It's interesting to see it build. I can't tell if the agent will run indefinitely, but it's been going for 7 or 8 minutes now, constantly tweaking its composition.
I find this approach to be more appealing than AI models that generate fully baked songs as waveforms. Give me something I can open in Logic and keep tweaking…
Yeah, planning to add the wavesurfer (https://www.npmjs.com/package/wavesurfer.js) support soon. Do you recommend any other library for that?
You can export the track stems already; I'll add MIDI export soon :) Thanks for the feedback!
Very neat. I built an agentic web DAW last year. Models were pretty crap at producing anything good, but that's changing rapidly.
I forked anthropic's MCP at the time to use it in the browser, but it was just too much trouble and I wanted to wait for something like WebMCP to appear before fiddling with it more.
Planning on dusting off the DAW and releasing it very soon.
After playing around all day over the past few days, I've come to the conclusion that the system prompt goes a huge way! We've incorporated music theory and many other instructions in the system prompt for it to be able to come up with something like this.
With that, definitely looking forward to models producing good music the analytical way and not pattern finding as we see in specialized audio/music-gen models.
Yep, I totally agree that context engineering is everything here, but the jump in model quality just in the last 4 months has just been insane. They are just way better at this now.
In the case of my DAW, I went even fundamental and created a node-based visual UI and gave the agent the ability to program new modules using the Web Audio API, and to choose from selection of stock instruments and effects. Modules are editable after instantiation, and automatically create UI for each module based on the parameters, input and output. It could spawn and wire things up, do sound design, that sort of thing.
I also have recently tried out Gemini 3.1 Pro out on audio, and you should give it a spin if you haven't yet. It actually is the first model I've seen really able to talk about music in terms of frequency and time with great accuracy. It can break down songs by instrumentation, composition, sound design, arrangement, etc.
Its philosophical take on the music itself isn't always great, but it does have precision and at a high level you can see where things are headed. Some of its advice was definitely valid and actionable. I want to plug it into my DAW or Ableton MCP and see what happens. It might actually be able to do real sound design. What I want to do is not just ask for a melody, but be able to say things like, "let's throw a Reese base in there" or "sidechain everything under the kick" and for the model to know what I'm talking about. So not just music theory, etc. but sound design as well.
I'd love to chat about this more somewhere and cross-pollinate ideas if you're up for it, email's in my bio.
mailed you!