MidJourney V5.2 vs. V6 Alpha: We Test the Differences

MidJourney has launched the Alpha version of it’s V6 model, and there are many promised improvements over V5.2. We look at what’s better on paper, and test it against the older model.

What’s New in V6?

Undoubtedly a lot has happened under the hood with V6, but MidJourney highlighted the key features in an official Discord thread. Note that you’ll have to be a member of the MidJourney Discord to view the post in question. These are the most important changes:

More accurate prompt adherence.
Longer prompts.
Improved coherence, and model knowledge.
Improved image prompting and remixing.
Minor text drawing ability (similar to the new DALL-E model).
Improved upscalers, with both ‘subtle’ and ‘creative’ modes.

In short, V6 brings MidJourney more in line with the impressive new capabilities of rival tool DALL-E 3, but here we’re interested in seeing how much better it is than the V5.2 model which was the default at the time of writing.

If you’re a MidJourney subscriber, and you want to try the new V6 Alpha version, simply type /settings into Discord and then choose V6 from the model menu that appears after sending the command.

Prompt Adherence

The first thing I want to test is how well the new model adheres to the prompt. In the past, MidJourney would take details in the prompt more like vague suggestions than instructions. So here’s a prompt with very detailed instructions.

Draw a marketplace in a futuristic city. To the left of the frame, is a woman with a shopping basket in her left arm. To the right is a street market stall. A robot is behind the stall selling fruit. The robot is purple, and he is holding an apple in his outstretched right arm.

For each model I’ve chosen the image that most closely matches my prompt. Here’s the best V5.2 came up with.

An AI-generated image of a marketplace in a futuristic city. — Sydney Louw Butler/How-To Geek/MidJourney

Here is the best that V6 came up with.

AI-generated image of a marketplace in a futuristic city. — Sydney Louw Butler/How-To Geek/MidJourney

While V5.2 generally includes all the element I asked for, they aren’t arranged correctly relative to the frame or each other at all. The only real mistake V6 made here is putting the apple in the robot’s left arm and the shopping basket in the girl’s right arm. Perhaps most importantly, all the images generated by V6 are much more coherent than those made by V5.2, which has no sense of framing or balance here, and just feel sort of mashed together.

Putting Text Into Images

Like DALL-E 3, MidJourney V6 boasts the ability to properly integrate text in an image. All you have to do is separate the text using quotation marks in your prompt. Here’s the prompt we used:

A fabric flag with the words "How To Geek" on it.

I’m putting all four attempts of both models here to show that V6 is not perfect at this yet, but none of the V5.2 images are anywhere close to getting the text right.

Four AI-generated flags in each quadrant of the image, with garbled text. — Sydney Louw Butler/How-To Geek/MidJourney

With V6 however, it was 75% successful on the first attempt, and you can clearly see the text is properly integrated into the image, rather than simply overlaid.

Four AI-generated flags that have the words How-To Geek on them. — Sydney Louw Butler/How-To Geek/MidJourney

Artistic Quality

While we can more or less objectively test how well V6 can follow a prompt or integrate text, artistic quality is much harder to nail down. In my comparison of MidJourney models V1 to V5.2, it was clear that with every new model the AI was becoming more “imaginative” for lack of a better word. Composition and detail also drastically improved, and honestly, V5.2 still came out on top when it comes to artistic flair, as I noted when I compared MidJourney to DALL-E 3.

So I think this is best left up to the judgment of each person reading this, and so here are a few pairs of images, with V5.2 on the left and V6 on the right.

Two side-by-side AI-generated images of idyllic elven villages featuring whimsical houses and lush greenery. — Sydney Louw Butler/How-To Geek/MidJourney

Prompt: An epic and beatiful fantasy scene of an elvish village where the elves are going about their business. Make it an oil painting

Two side-by-side images of a futuristic street scene with aliens, robots, and humans all living in the same city. — Sydney Louw Butler/How-To Geek/MidJourney

Prompt: A futuristic street scene with aliens, robots, and humans all living in the same city. Make it in the style of a digital speed painting.

An AI-generated pair of images side-by-side of A nature photograph of mountains as seen from the beach, with a large visible moon in the sky. — Sydney Louw Butler/How-To Geek/MidJourney

Prompt: A nature photograph of mountains as seen from the beach, with a large visible moon in the sky.

It’s Just an Alpha (For Now)

It’s really important to keep in mind that MidJourney V6 is not finished at the time of writing. This is a new model trained from scratch, but with the lessons learned from previous models. V6 is still missing some of the awesome value-adds you can find in V5.2, such as the ability to pan the image.

What is clear is that you can throw all the prompt engineering tricks you know for MidJourney out the window, V5.2 is still perfectly capable of creating stunning and usable images. At this stage, there’s no harm in trying the V6 Alpha model to see if it gives better results with your prompts, but keep V5.2 close at hand too.

source