Are you questioning yourself how many models for Image Generation we have right now? Right now, the most common type of model used for creating images is called a Diffusion Model.
In simple terms, a Diffusion Model works like this:
Imagine a beautiful, clear photograph. Now, imagine adding tiny, random dots of “noise” to that photo, over and over, until it’s just a blurry, unrecognizable mess of static.
- The Diffusion Model is an AI that has been trained to do the opposite. It knows how to start with a messy, noisy image and, step-by-step, figure out what the original image was.
- When you give it a prompt (like “a cat wearing a crown”), it starts with pure digital “noise” and then uses its training to “denoise” that static until a brand-new, unique image of a cat with a crown appears.
This “denoising” process is what gives Diffusion Models their incredible ability to create such detailed and high-quality images from scratch. The most popular models you hear about, like Stable Diffusion and DALL-E, are based on this technology.
For a while, another type of model called a GAN (Generative Adversarial Network) was also popular.
In simple terms, a GAN works like a competition:
- You have two AI models working together. One is called the “Generator” and the other is the “Discriminator.”
- The Generator tries to create a fake image (e.g., a fake human face).
- The Discriminator is shown both real human faces and the fake ones from the Generator. Its job is to figure out which are real and which are fake.
- The two AIs play a continuous game of cat-and-mouse. The Generator keeps trying to create more and more realistic fakes to fool the Discriminator, and the Discriminator gets better and better at spotting fakes. This constant competition is how they both get incredibly good at their jobs.
While GANs were groundbreaking, Diffusion Models are generally considered to be better at creating a wider variety of styles and more detailed, high-quality images, which is why they have become the dominant technology.
What models do other platforms use?
Stability AI and Leonardo AI have developed their own specific models, but many other platforms actually license or build on top of these popular models.
- OpenAI’s DALL-E 3: This is one of the most powerful and widely used models, and it’s built on the diffusion model technology. It’s the engine behind ChatGPT’s image generation and Microsoft’s Copilot.
- Google’s Imagen / Gemini: Google has its own powerful diffusion models, which it uses to power image generation within its Gemini assistant. These are trained on Google’s vast data sets to produce high-quality results.
- Midjourney: Midjourney has its own proprietary models, which are also a type of diffusion model. They are famous for their unique, highly artistic, and cinematic style that stands out from other models.
- Adobe Firefly: Adobe built its own models specifically for creative professionals. The unique thing about Firefly’s models is that they are trained on a large dataset of licensed images and public domain content, which helps with copyright and ethical concerns.
So, while the companies themselves might have their own “flavor” or secret sauce, the underlying technology for most of the top-tier image generators today is a Diffusion Model.
