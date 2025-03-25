GPT-4o image generation is now available in ChatGPT. The new image generation model, which replaces DALL-E 3, is most notable for its accurate text rendering, improved "binding" capabilities, and ease of use.

Unlike traditional diffusion image generation methodology, which "paints" details on top of random noise, GPT-4o utilizes a top-to-bottom, side-to-side autoregressive system. It's slower than diffusion, but the benefits of autoregression are as clear as day. GPT-4o is capable of spitting out images with perfectly legible text—something that AI models like DALL-E 3 have continually failed to achieve.

Not only that, but you can specify textual content for generated images. Write out a prompt like "give me a photorealistic image of a girl writing on a whiteboard with messy handwriting," tell the AI whatever words you want to see on the whiteboard, and it'll give you something fairly accurate. And, perhaps more importantly, the model is quite good at writing 2D stylized text for restaurant menus, advertisements, or other items that may be useful to businesses or hobbyists.

The autoregressive approach also seems to help with "binding," which is a fancy way of saying that the AI doesn't get confused by prompts that contain multiple subjects. If you ask DALL-E 3 to draw a red circle, a blue triangle, a green heart, a pink star, and a purple square, it may trip over itself and spit out the wrong shapes or colors. GPT-4o, on the other hand, can accurately handle up to 20 different objects.

When paired with the model's text rending capabilities, improved binding clearly creates some interesting opportunities for corporate art or advertising, though it's also just a generally useful thing that makes image generation easier to use.

Of course, GPT-4o image generation is just "better" than DALL-E 3. Photorealistic images look more true to life, digital art looks less soupy or grainy, and new inferencing techniques reduce the need to type out long, complicated prompts. The model also boasts improved "character consistency," meaning that a character or object generated in one prompt can be accurately carried over to subsequent prompts—if you tell the AI to reuse a cyborg cat that it created, it won't change the color of the cat, and so on.

OpenAI admits that its new image generation model is imperfect. It still struggles with hallucinations, mathematic representations (like charts or graphs), multilingual text, and more. Still, it's clearly an improvement over the company's previous image generation models.

OpenAI says that GPT-4o image generation contains safeguards to prevent misuse, plus advanced watermarking techniques to help people differentiate AI-generated content from real, human-made stuff. But I'll go out on a limb and assume that these safeguards can, with effort, be circumvented. And OpenAI is still using C2PA watermarking, which is just metadata. It takes very little effort to remove this metadata from an image—C2PA is ineffective at preventing the spread of misinformation.

The new GPT-4o image generator won't alleviate concerns about copyright or fair use, either. It was trained on a mix of "publicly available" data and licensed data, according to a statement provided to The Wall Street Journal. AI companies are known to brazenly defy basic copyright law, and OpenAI does not share its training data with the public, so feel free to draw your own conclusions on this matter. (For what it's worth, OpenAI doescare about copyright when it's work is stolen.)

GPT-4o image generation is available today. Open ChatGPT in your browser, ask the AI to generate an image, and enjoy. Note that the rollout is not complete, so some users may still encounter the old DALL-E 3 model. The best way to tell the difference is to observe how a generated image loads. DALL-E 3 loads images with a spinning wheel, while GPT-4o images load with a pleasant top-down side-to-side flatbed scanner-ish animation.

All ChatGPT users can access GPT-4o image generation, including free users. However, free users face usage limits, just as they did when using DALL-E 3. By the way, DALL-E 3 will remain available in custom GPTs for those who want to use it.

Source: OpenAI