Chris McCormick    About    Membership    Blog Archive

Become an NLP expert with videos & code for BERT and beyond → Join NLP Basecamp now!

What You Can Reasonably Expect from Stable Diffusion

I think AI art inspires awe in us because:

  1. It’s incredibly imaginative.
  2. The artistic technique is masterful.
  3. It was created by an AI.

Unfortunately, what it so masterfully generates also tends to be very incoherent.

In my experience, you should probably decline any offers for a ride in a stable diffusion fighter jet… 😜

Stable Diffusion Fighter Jet

Browsing libraries of generated imagery, the most popular (and successful) subject to generate seems to be portraits.

Symmetrical Portraits

(From here, here, and here.)

A head-on perspective, with a lot of symmetry, seems to work best. There are strong examples of other perspectives as well, just less common:


(From here, here, and here)

If you start going below the shoulders, though, many of your generations will be “ruined” by unrealistic posing of limbs or incorrect proportions.

Most critically, though–Stable Diffusion v1.5 seems, for all practical purposes, incapable of generating hands and fingers, or the correct interaction between a person’s hands and an object.

White Knights

(From here)

Holding Objects

(From here, here, and here)

That’s pretty discouraging, because most the time when I have an idea for something I want to generate, it’s a scene–a subject in a setting, interacting with one or more objects. For example, a “blacksmith forging a sword in his workshop”…

Blacksmith working

(From here)

It seems clear to me that you’re just not going to get a polished image straight out of the model on something like this (no matter how much you play with the prompt or settings).

But there’s hope! There are a number of fancy techniques out there that we can try, such as:

  • We can use inpainting to regenerate a specific object in the scene
  • With outpainting, we can first generate a subject that we like, and then expand outward to create the setting.
  • Providing a starting image can help us dictate the layout of the scene.
  • Compositing can help us blend in replacements for parts of the image.
  • Fine-tuning can help it generate a particular subject or style more reliably.

And, while I don’t want to become a full on graphical artist or photoshop expert, I’d be willing to pick a few of those tricks here or there to get what I want. 😊

I’ve kicked off a new YouTube series to explore all of the above! I’ve started out providing an introduction to the absolute basics of AI art generation, since there’s still plenty to learn about prompt design and the purpose of the different settings before getting to the more advanced techniques I listed.

I also plan to publish blog posts here and there to cover specific topics. Stay tuned!