There might be something to play around with here.
Haven’t done my research yet, just listening to a podcast before bed and one of the guests from Izotope mentioned it as inspiration for what they do, though opposite to their goals. Magenta is an AI research lab at google specializing in trying to get AI to do art, while Izotope wants AI to enhance the work of human artists. Looks really good if you use ableton.
Interesting stuff. Gonna give it a spin at some point to see how much my organic artistic deficiencies suck as compared to random stuff composed by first-gen consumer-grade AI lol… (Also have to say that I wasn’t impressed by the automatic iZotope features so far, but I didn’t upgrade to the latest version.)
I still haven’t had a chance to look into it. Sound demos were definitely underwhelming. It’s kinda doing what the harmann neuron synth was doing back in the early 2000s, but worse. Or maybe the same. You probably haven’t heard of the harmann neuron for a reason.
And the Izotope stuff is alright for pop and rock probably, but not super well tuned for dnb or idm or house. All it knows is “this is a synthesizer and in a normal mix it would sound good to do this” so it does it. Then you toss another synthesizer at it and it does the same thing. It can’t fathom that one of these is a lead, one is a bass layer, then there’s another bass layer and a separate lead effects bus and so on. But the AI is mostly in the ability to identify different instruments, which it then takes to a lookup table of settings. Kinda like how Excel does calculus (it doesn’t, just looks up common statistics solutions without actually integrating the area under the probability curve).
This looks interesting. Going to play around with it this weekend, see if I can write a song with it collaboratively. Amazon is horribly defining this project elsewhere by trying to sell a midi keyboard to go with it that has an amazon logo on it, but that’s dumb, you probably already have one around, and it has a piano roll if you don’t.
Here a tool that produces all kinds of genre-specific music in audio format (already not bad, but without chorus and similar structures): https://openai.com/blog/jukebox/
Boy, the AI sure has come a long way in 5 years. So now, there are some interesting use cases of the more fully generative side of things.One such case, there was a song by an actual artist which the label didn’t release because he name-dropped a kid in the song (listen to this and you’ll see). So, someone recreated the song with AI and modified it slightly. Still not a label release AFAIK, but an interesting way to recover what would otherwise have been lost music.
In a not too dissimilar vein, the opportunity for parody is un-fucking-paralleled now.
I don’t think anyone is fooled into thinking that’s a human artist, but with the vibe and humour, it’s close enough to ignore some of the problems and just laugh for a bit.
Interestingly though, I think there’s still an element of musicianship here, because if you write a shit song about something that’s not interesting, then there’s still no saving it. For instance my brother showed me this one and I turned it off in 30 seconds. It’s just not a good song unless you like hearing the word “fuck” harmonized a lot.
So AI fake records in retro styles… the next hit genre? I doubt it, but I could see stuff like this being the soundtrack to a genre of memes. And that one about the methlab is funny enough that I’ve listened to it more than once.
Good points and funny tracks, I can definitely see the meme and parody potential. The methlab song is pretty awesome
Not sure yet if the the general usage takes off like it did in image generation - maybe not since it takes longer to listen to a track than it does to glance at a picture… also I think listening to music for many people is more lifestyle related. Very interesting developments in any case.
Also interesting for me to hear that that transient-heavy stuff such as drums and staccato sounds are often worse than everything else - that will probably change over time and there might be some potential for “upscalers”…
I have no idea how these are generating the actual sound. I could see a few ways. To be clear up front, I get what the front end looks like to us as users is we ask for a certain song and we get it, but how that happens…
First is to just generate the whole stereo master in one pass with the vocals and everything on it. Basically, the AI is just trying to approximate a copy of a finished song with whatever lyrics are being asked for by the songwriter. I think that might be what’s happening in some of the worse cases.
I think a better way to handle it would be to generate each part separately and then do a basic sum of them as a mix. That way, the AI can have a bit more control over the separate parts and can maybe hide some of the glitchier bits behind something that sounds better or draws more attention (similar to perceptive encoding in MP3s).
For my part, the most obvious thing that is off to me is the voices in pretty much all these cases. There’s always something a bit off with the delivery, and it might only be in one line of a song somewhere, but it will be something that lands out of tune/rhythm somewhere, something that makes me go “damn does this guy ever take a breath?” or something like that. And that’s assuming there aren’t other obvious artifacts in the vocal. The older more motown style doo-wop stuff lands a bit better I think because the vocals end up so bandpassed that it hides a lot of the technical glitches and the staccato delivery in time with the music kind of hides the pacing issues.
But yeah, I hear you on the staccato sounds being a bit off, I think part of it is the rhythm in general. I haven’t really sat down and listened for this, but I’d be surprised if the AI is doing a good job with natural swing, which might be making everything sound subconsiously “off” because you’d expect a little swing in country and more in motown.
That’s an interesting thought. I assume that everything is generated in total, since the models are probably trained on complete songs and generating single parts would lead to additional steps. The training for various layer types would lead to a high demand in multiple input stems and the mixdown phase might be another problem. The current generation of layered sounds might even mask some quality problems, at least for some content. But for future models, this might be the way to go forward at some point. And it would open up new ways for use by musicians that only need single instruments for a track… No idea if there already are models that do single parts/instruments though, development of new tools is crazy fast these days…
Yeah, I agree that the vocals aren’t perfect for sure, but the improvement in that area compared to earlier models is huge imho. I’m more impressed by the current vocal quality compared to drum transient quality, but that might be my background in DNB lol…
Interesting thought. Not completely sure about it, but I think ML should be able to handle it without huge problems if trained on swing-type tracks, but it’s definitely possible that the results might sound a tiny bit off… I was thinking more about the pure sound quality of fast transients in staccato-type instrument parts and drum hits, somehow that is what stuck out most negatively to me when listening to AI stuff… Maybe because I just was expecting a bit more in that regard while I didn’t assume vocals to be 100% perfect…