AI image generation tutorial?

Guy_Wachtel · March 8, 2024, 11:24pm

Hey All. @TvMcC recently posted some cool AI art in another thread. This made me wonder if there is any interest from users who have never used AI image generation for a tutorial on using this tech.

It’s pretty fun and you can get some cool stuff out of it. It’s great for Album art among other things.

There are several ways to go about it, there are web tools (paid and free) as well as local builds that you can run on your own PC (this assumes a decent GPU).

I have set up several instances on my PC over time, so if there is any interest from folks in learning how to do this, let me know here and I can throw a video together.

Manton · March 9, 2024, 12:39am

I need to learn. I’m not quite sure if it’s my text prompts that are not good enough, or the AI image generating tool I’m using is just really bad. I have used Canva which I think uses Dali? And I have used craiyon. I’m pretty sure that craiyon just matches premade AI images to your text prompt, and it doesn’t actually customise. Looking for the best free tool.

Guy_Wachtel · March 9, 2024, 12:54am

I feel like Dali has fallen behind quite a bit, there are much better generation tools now.

morphic · March 9, 2024, 1:57am

If you have time to do that, I’d certainly be interested

I’ve created a few surreal images with AI, which were fun, and considered paying for a solid tool… as long as it isn’t a subscription (because subscriptions can fuck off far & away as far as I am concerned). I have no idea which one is best or even what’s available as I haven’t done any real research on the topic.

KvlT · March 9, 2024, 3:03am

I downloaded some frontend for SD someone made on Itch.io and it took something like 30 minutes just to cough up a 500x500 image. I think I might need a substantial upgrade before trying again

There’s also this really cool program I wanted to try (Ultraforge, it’s basically Filter Forge’s older brother) that has SD built inside of it and a visual programming language built on top for image manipulation, but my laptop was no match for the specs and I doubt my new PC is any better.

Even though I’m not a huge fan of AI or generating art that I didn’t have a hand in creating, I wouldn’t turn down the opportunity to demo Ultraforge in all its glory, or even do my own generation from time to time locally. But alas, I think it’ll be a few years before I can actually afford a machine that will cover the basics. But I don’t mind waiting, either.

I almost wouldn’t mind making some API calls from within Processing or something in order to use the data as starter material for new projects (plucking and sampling for color palettes can be damn cool, and automating the process could be even cooler!) but I’m assuming most of these are paid, so I didn’t really look around for something that could let me probe the shit out of it for free like some of those stock image websites I use do. But hey, maybe I just need to look around harder.

Manton · March 12, 2024, 7:44pm

I need help! So I started playing around in playground. I ended up with a couple of images that I kind of like but I would like to change. I’ve been trying to use the inpainting On the canvas to try and change the parts that I don’t like, however, it doesn’t do a very good job. It appears that when you do an ‘in paint’ That it does not consider the entire image for context. What to do from here?

Alternatively, I would like to be able to take a crop from another image and use that to generate the ‘inpaint’ Portion of the image. But it doesn’t seem to be able to do this, you can only generate an entirely new image from another image and give that image a certain weight. @TvMcC Any suggestions?

I am also finding that the scaling of objects in relation to other objects seems to be quite hit and miss. Mostly miss.

TvMcC · March 12, 2024, 8:52pm

Disclaimer: Always take what I have to say with a grain of salt, and then chuck that into the ocean to dissolve!

@Manton
Let’s see if we can all possibly come up with some solutions, and bare in mind some of what I am going to suggest, I haven’t put into practice just yet.

Lurv Canva, but haven’t used it in ages, and when i saw it as part of Playground, I thought that would be neat to play with. Started using it for a bit, and had to shut it down to go work on my primary form of art, music! While I do firmly believe we should be able to utilize Canva + Playground to do what you want to accomplish, I’ve also read from various internet rants that it isn’t possible in that service, nor is it possible in Midjourney. Some have cited that they’ve been able to do something similar in hitpaw, but once again I haven’t had time to explore using it.

Some of it is truly all about spending time with it, and just trial and error… probably more error…Unless one is a working professional artist who already is familiar with all of the precise lingo/terms and so on… it’s like using chatgpt to write hit songs (which I’ve watched a pro do) you need to sit down with it and correct it’s mistakes when it gets inversions wrong, doesn’t recognize the proper stanza, and so on… keep chatting to it, till it refines the idea to what we are looking for, and then maybe ya have a Michael Jackson hit?

Here are a few suggestions I’ve used with any of these AI generator thingamaBOBs:
(Type of art) (fractal, oil, baroque, impressionism, cubism art) of [subject], (Style) (folk, tattoo, graffiti, etc), [theme/material/element], (tone, ie. glowing, vivid, intricate), (color references for any object or primary source), (resolution ie. 8K), HD, digital art — v 4 — q 2

another example:
Design a techno-organic extraterrestrial in a Futuristic Sci Fi professional masterpiece portrait, fractal art, similar to art by Danny Flynn, Julien Allegret, Charlie Vicetto, and Paul Griffitts. (add a few other variables, or details like colors, shapes for background patterns, etc.)

Using exclude can also help with prompts like:
bad proportions, beyond borders, cropped, branding, craft, duplicate, grains, grainy, gross proportions, improper scale, low quality, low resolution, mishappen, mistake, outside the picture, unfocused, watermark

Hoping this rant helps, maybe inspires others to chime in (so we can all get the best out of the bots!!!), or maybe has the haters come out and say “This idiot don’t know what he is talking about, this is how you do it: (insert arseholes solution)”

best of

Manton · March 12, 2024, 10:20pm

I think part of my problem is that I don’t know art. So, i can’t conjure text input such as, make an image in the style of (artist).

I listened to a couple of tutorials on Playground, which I think have cleared some things up for me.

Hard set the engine to Stable Diffusion, or Playground 2.5
Use key words you would expect to find on images found on google etc.
Playground was trained on images of a specifc aspect ratio. 5x7 or something? (i’ll have to double check) So best to keep the generative side to these proportions otherwise results may vary.
Use negative keywords
Use the seed function to keep what you have and adjust only small pieces of the imageU* Once you have an image that is close to what you want, feed that image into the “image-to-image” for even better results
Once you have what you want use Creative Upscale to make pro.

TvMcC · March 13, 2024, 7:23am

These are some great tips, thanks so much for sharing

Guy_Wachtel · March 13, 2024, 7:57am

Ok, here’s a video I put together going over the installation and basic use of the Automatic1111 webUI for stable diffusion.

Sorry about the low quality and some occasional audio issues. I think the video will take some time to process beyond 360p, but you can get the gist of it I’m sure, especially if you follow along. I talk about install and basic use of just the txt2img functionality. I didn’t want to make it too long by going into img2img, inpainting, sketching, extensions, or anything like that. I can always do another video on those topics if there is a desire from folks.

The last half of this is me just playing around and going over some tips, tricks, and some of my favorite models and embeddings. Feel free to watch as much or as little as is helpful to you.

Here are some important links for getting started

A1111 Github repo:

places to download Models:
Hugging face:

Civit Ai:

Also, just a quick clarification on a comment i made about Git, you will need git for this installation, it’s not optional for this use, i more just meant that you don’t really need it for anything else unless you already know what it is and how to use it, so don’t stress the install.

Manton · March 13, 2024, 9:21pm

damn, looks like the video link is broken.

thank you for making this, i look forward to following along.

Manton · March 13, 2024, 9:41pm

When using Image to Image, what do you write in the text prompt? Do you still write the entire script of what you want, or does the ai figure out what is there and go from there? For example, if I have a image of a room with a window, but i want to change what’s outside the window, how much information do i need to feed it? I’m having a lot of trouble trying to get a silhouette of a person to show outside a window. It doesn’t matter how many different ways I write the text prompt, the ai seems to ignore the instruction every single time. Most of the time it doesn’t show any person, but when it does, they are inside the room. It’s very annoying!

Guy_Wachtel · March 13, 2024, 9:49pm

updated the video link. let me know if it still doesn’t work

Guy_Wachtel · March 14, 2024, 3:45am

It really just depends on what I want to change about an image. It does figure out what is there, yes, but your text also has influence. How much influence? it depends on a bunch of stuff… model, cfg, denoising, your prompt and weights. tons of stuff. So it really just depends.

If i have a particular image in mind, i try to get as close as i can with txt2img before moving on to any img2img. My usual workflow is to get close with txt2img, then do any inpainting, and then finally do a img2img at a low denoise strength, somewhere between 3 and 6 usually. thats my typical workflow.

That sounds like a tricky one. But im sure theres a way it can be done. Here are some things i would try:

I tried to play around with playground, and it seems like it doesnt have nearly the control of the local webui that i use, so my suggestions may not overlap well with what you have available

if you have a decent base image of your room and window, i would first try inpainting. When I do inpainting, i only give the prompt what i want to see in the inpainted area, not what the whole image should contain. For instance, i would paint the window and then say something like, “1man, (sillhouette:1.3), dark figure, behind glass, behind window” in the prompt and maybe give the negative something like (in front of glass, skin, eyes, mouth, face, nose, detailed head) something like that?
In inpainting, play with the strengths of different parts of the prompt to see what differences it makes. also play around with the strength, try it first with really high strength, so it cares very little about the initial, image, then slowly adjust your promt and the strength back down until you get something more cohesive.
try to make a new image of just a sillouette behind a window. forget the rest of the room for now. Do this in a small resolution to start. Once you find a prompt that does what you want in that regard, you can take the same prompt and expand upon it by adding new elements slowly. Alternatively you can increase the image size after the fact in img2img and have it fill in a new room around it. to do this Start with a low denoising, so that you kinda just get noise and color that sorta looks like it stretched the initial image, and then inpaint all of that new noise around the original image and give it a prompt to paint in a room around it.

Hopefully those made sense. I’ll probably record another video (maybe tonight) and demo some of these techniques. that may be more helpful that trying to decipher my gibberish text.

for the record I’m trying this out in playground and am having a tough time myself. I’ve gotten it a few times, but it’s hard to get it to be consistent.

Guy_Wachtel · March 14, 2024, 3:46am

This has been somewhat consistent

Positive prompt: in a room, view of the window, (person silhouette peering in through the window:1.3), looking inside, facing viewer, (person outside the window:1.6)

Negative: (person inside the room:2)… {other negatives}

Manton · March 14, 2024, 4:10am

Thanks for writing all that. I will give it a go. FYI the below is the only image out of over 100 that it has drawn a person outside the window, except they’re facing the wrong direction lol

Sometimes when I write that you can only see the head and torso, it flags it as inappropriate and doesn’t do it. I assume because it’s trained not to do gruesome/gore images, even though I’ve stipulated that it just a shadow/silhouette.

Manton · March 18, 2024, 6:37am

The video you posted is a really great tutorial man. Thank you. I haven’t installed stable diffusion on my PC but all of this can help in playground somewhat. Im not sure my 1080TI would be any good for it. Although I’m a little unsure that the parentheses/value parameters work in playground? I did try copying the prompt you posted and it didn’t return anything usable. So it could be something in the playground settings that was incorrect. I’m pretty sure that I had stable diffusion as the engine selected.

Guy_Wachtel · March 22, 2024, 3:22am

Looks like a 1080ti has 11gb vram? is that correct? it might be a little bit slower than my image generation, but i would imaging that it shouldn’t be too much of an issue. I’d give it a shot if you have some time. If anything i’d be curious to know how it goes if you do. Looks like there are some folks on reddit who use even lower grade cards, like a 1060, and they are still having good success.

Makes me think I might buy an old 10 series card and build a dedicated SD server.

I actually wrote a discord bot so that my friends can use my local SD instance on discord, It works a lot like midjourney does. It would be nice to have a dedicated machine to run my processes (i have a discord bot, a self hosted cloud, a plex server, and a few other things) instead of using my gaming computer, as I have to shut off that functionality every time I want to play a game.

Manton · March 25, 2024, 12:39am

@Guy_Wachtel I’m not sure if you put two and two together, but that cover art for Retro 1 was supposed to have the person on the outside of the window peering in. I went through about 150 iterations and images. None of them gave me a person out of the room, most of the time it completely ignored the request, and the other times it placed the person inside the room. I asked for a car outside and it gave me a car outside, several times. So, the AI clearly understood the concept of inside and outside. I guess it thinks that cars belong outside, and humans belong inside. What a depressing state lol.

Manton · March 25, 2024, 12:41am

Yes, that is correct. Ok good to know.