There might be something to play around with here.
Haven’t done my research yet, just listening to a podcast before bed and one of the guests from Izotope mentioned it as inspiration for what they do, though opposite to their goals. Magenta is an AI research lab at google specializing in trying to get AI to do art, while Izotope wants AI to enhance the work of human artists. Looks really good if you use ableton.
Interesting stuff. Gonna give it a spin at some point to see how much my organic artistic deficiencies suck as compared to random stuff composed by first-gen consumer-grade AI lol… (Also have to say that I wasn’t impressed by the automatic iZotope features so far, but I didn’t upgrade to the latest version.)
I still haven’t had a chance to look into it. Sound demos were definitely underwhelming. It’s kinda doing what the harmann neuron synth was doing back in the early 2000s, but worse. Or maybe the same. You probably haven’t heard of the harmann neuron for a reason.
And the Izotope stuff is alright for pop and rock probably, but not super well tuned for dnb or idm or house. All it knows is “this is a synthesizer and in a normal mix it would sound good to do this” so it does it. Then you toss another synthesizer at it and it does the same thing. It can’t fathom that one of these is a lead, one is a bass layer, then there’s another bass layer and a separate lead effects bus and so on. But the AI is mostly in the ability to identify different instruments, which it then takes to a lookup table of settings. Kinda like how Excel does calculus (it doesn’t, just looks up common statistics solutions without actually integrating the area under the probability curve).
This looks interesting. Going to play around with it this weekend, see if I can write a song with it collaboratively. Amazon is horribly defining this project elsewhere by trying to sell a midi keyboard to go with it that has an amazon logo on it, but that’s dumb, you probably already have one around, and it has a piano roll if you don’t.
Here a tool that produces all kinds of genre-specific music in audio format (already not bad, but without chorus and similar structures): https://openai.com/blog/jukebox/
Boy, the AI sure has come a long way in 5 years. So now, there are some interesting use cases of the more fully generative side of things.One such case, there was a song by an actual artist which the label didn’t release because he name-dropped a kid in the song (listen to this and you’ll see). So, someone recreated the song with AI and modified it slightly. Still not a label release AFAIK, but an interesting way to recover what would otherwise have been lost music.
In a not too dissimilar vein, the opportunity for parody is un-fucking-paralleled now.
I don’t think anyone is fooled into thinking that’s a human artist, but with the vibe and humour, it’s close enough to ignore some of the problems and just laugh for a bit.
Interestingly though, I think there’s still an element of musicianship here, because if you write a shit song about something that’s not interesting, then there’s still no saving it. For instance my brother showed me this one and I turned it off in 30 seconds. It’s just not a good song unless you like hearing the word “fuck” harmonized a lot.
So AI fake records in retro styles… the next hit genre? I doubt it, but I could see stuff like this being the soundtrack to a genre of memes. And that one about the methlab is funny enough that I’ve listened to it more than once.
Good points and funny tracks, I can definitely see the meme and parody potential. The methlab song is pretty awesome
Not sure yet if the the general usage takes off like it did in image generation - maybe not since it takes longer to listen to a track than it does to glance at a picture… also I think listening to music for many people is more lifestyle related. Very interesting developments in any case.
Also interesting for me to hear that that transient-heavy stuff such as drums and staccato sounds are often worse than everything else - that will probably change over time and there might be some potential for “upscalers”…
I have no idea how these are generating the actual sound. I could see a few ways. To be clear up front, I get what the front end looks like to us as users is we ask for a certain song and we get it, but how that happens…
First is to just generate the whole stereo master in one pass with the vocals and everything on it. Basically, the AI is just trying to approximate a copy of a finished song with whatever lyrics are being asked for by the songwriter. I think that might be what’s happening in some of the worse cases.
I think a better way to handle it would be to generate each part separately and then do a basic sum of them as a mix. That way, the AI can have a bit more control over the separate parts and can maybe hide some of the glitchier bits behind something that sounds better or draws more attention (similar to perceptive encoding in MP3s).
For my part, the most obvious thing that is off to me is the voices in pretty much all these cases. There’s always something a bit off with the delivery, and it might only be in one line of a song somewhere, but it will be something that lands out of tune/rhythm somewhere, something that makes me go “damn does this guy ever take a breath?” or something like that. And that’s assuming there aren’t other obvious artifacts in the vocal. The older more motown style doo-wop stuff lands a bit better I think because the vocals end up so bandpassed that it hides a lot of the technical glitches and the staccato delivery in time with the music kind of hides the pacing issues.
But yeah, I hear you on the staccato sounds being a bit off, I think part of it is the rhythm in general. I haven’t really sat down and listened for this, but I’d be surprised if the AI is doing a good job with natural swing, which might be making everything sound subconsiously “off” because you’d expect a little swing in country and more in motown.
That’s an interesting thought. I assume that everything is generated in total, since the models are probably trained on complete songs and generating single parts would lead to additional steps. The training for various layer types would lead to a high demand in multiple input stems and the mixdown phase might be another problem. The current generation of layered sounds might even mask some quality problems, at least for some content. But for future models, this might be the way to go forward at some point. And it would open up new ways for use by musicians that only need single instruments for a track… No idea if there already are models that do single parts/instruments though, development of new tools is crazy fast these days…
Yeah, I agree that the vocals aren’t perfect for sure, but the improvement in that area compared to earlier models is huge imho. I’m more impressed by the current vocal quality compared to drum transient quality, but that might be my background in DNB lol…
Interesting thought. Not completely sure about it, but I think ML should be able to handle it without huge problems if trained on swing-type tracks, but it’s definitely possible that the results might sound a tiny bit off… I was thinking more about the pure sound quality of fast transients in staccato-type instrument parts and drum hits, somehow that is what stuck out most negatively to me when listening to AI stuff… Maybe because I just was expecting a bit more in that regard while I didn’t assume vocals to be 100% perfect…
Making whole songs from AI is shit, and anybody who profits off of it should take a good look at themselves. That said, using individually generated parts, like vocals or fx etc. is actually pretty cool. I know a lot of hardcore musicians are using AI tts for pre-drops/raps. Aside from that, robots shouldn’t be doing creative arts - they should be doing our taxes for us.
this evolution of AI is just the natural progression of the obsession of convivence at any cost.
I kind of see these entirely AI tracks being taken seriously as devaluing music with at least passion and hard work put into it.
Luckily, I haven’t really seen that too much yet. But I do find the AI slop parody songs that have flooded the web recently as just really annoying. Then again, I despise AI slop in general.
I have to agree with you, however, just to play devil’s advocate, and to spark conversation. What is the difference between text prompting AI to make a good song (i don’t think it would be easy!) and generative synthesis. By generative synthesis, I mean all forms, such as, connecting a hardware chain i.e Eurorack or other, setting up patches and hitting record to leave it, or it could be in the form of a Max patch or similar in software.
I think it’s good to ponder these topics and figure out what exactly is being creative and what is not. For instance, the connecting of the hardware and making the patches is the creative part, and you are not actually “playing” the music (unless you intervene in some way during the recording). Same can be said for text prompting. It takes a creative mind to describe, in detail, what you hear in your head so that the AI spits out what you want and I highly doubt it ever happens in a single text prompt. There will always be iterations and edits.
In both cases, it takes some form of skill to make music that has meaning and intent to the artist, but on the flip side, both cases can be exploited for quick gains. But…garbage in, garbage out.
Well, to put it simply, the difference between gen synth and AI is that AI is trained over and over with specific parameters to create the PERFECT product every time, and generative synthesis is a bit more free flowing and/or unique, and creates difference. AI is really bad because somebody can just insert a few key words and an AI model will just stick them together in an appealing way. It doesn’t take any skill, and it’s just spamming until you get something good, because even writing a whole ass novel for a prompt will give you a generic result regardless. I have to say, it’s fun for getting song ideas, etc, but I’d NEVER use it for an actual piece.
Gen synth actually takes some user input and customization to get a final result, right? Therefore it’s individual to creators and actually does take a little bit of time and effort. Even if you can just leave it to do the work for you, there is no defined “model” or anything stupid (so the work becomes unique, again) and people can expand on it the way THEY want. Things like Suno are changed over time by a team of developers, so…
Idk, you’re either a musician or you’re an AI prompter.
Honestly, isn’t it just a matter of HOW you use technologies? You can just get some generative patch, record it and publish the result, or you dive in deep and build and finetune your own patches, microedit the results and so on…
Same for AI, you can use a prompt written by someone else in something like Suno, or you can dive into something like TensorFlow or Pytorch, write your own code, finetune a trained model or even train your own model. Generally, I kinda have mixed feelings about both generative stuff and AI… I love generative patches and use modular synths for sound design (for both music and sound effects for games) but I think I never published a track completely done with generative patches. Not convinced by the AI sound quality yet, so haven’t done much with that yet, but if it can spit out some high-quality tracks that I love from a few prompts at some point, it might change. I’m not sure yet if I will be less motivated to listen to them just because I know that they’re done by AI. I really have to wait until the quality is there. But I don’t think that I like generative music less because I know it’s generative.
Also, there are a few different questions to consider imho:
Input: Artistic creativity/input/integrity/whatever → What did the artist/musician contribute to the final creation? It’s probably less creative to spend 1 minute to write a prompt compared to 5 days writing a song. But is it less creative to spend 2 days coding/finetuning some AI model and maybe editing the result than it is to spend 2 days making some generative patch?
Output: Mostly quality → Does it sound godd? Does it sound like there is some creattive spark in there, some inspiration, a story worth telling, a feeling in need of expression? AI might beat us there at some point.
For me, the big question at the moment is how much the knowledge about the artificial creation influences the experience. In other words, if you like an AI creation less than a human creation, ceteris paribus, not knowing who made the song, would you like listening to it less than to something created by a human?