Using AI is quickly becoming a part of daily life for many, transforming various industries, including ours, the creative arts.
There’s a generational shift happening in how we perceive the role of technology in the creative fields. And while there are some genuine and understandable concerns related to this, it’s still a good idea for creatives and business people like us, to become familiar with what these tools offer.
We tried our hands at Midjourney, a generative artificial intelligence program and service. Midjourney generates images from natural language descriptions, called ‘prompts’, similar to OpenAI’s DALL-E and Stability AI’s Stable Diffusion. It is a groundbreaking AI-powered image generating tool that is an especially noteworthy trend within the digital art scene.
How Midjourney AI image generation works
Users enter commands such as ‘/imagine’ or ‘/describe’ in Midjourney then enter descriptive text or prompts to generate an image. Prompts need to be clear and concise. It helps to provide context, any constraints, and specific styles, for example ‘surreal, charming, vintage or photo-realistic’. It’s useful to reference lighting e.g. ‘cinematic’ or colours and tone e.g. ‘warm tones’. The more specific you are, the better the results.
Once you submit a prompt through the Discord command, the bot interprets it and generates multiple images within a minute. You can then select your preferred image and even refine it further using various in-app options.
The U1, U2, U3, U4, buttons are “Upscale” tools to increase the size of the image.
The V1, V2, V3, V4 buttons under each image apply variations and generate different versions of that image. The “Vary (Region)” feature allows modification of specific areas of an image.
It can generate anything from product photos, landscape or portrait photography to 3D renders and illustrations. And you can experiment with various artistic styles, from photo realism to abstract art.
Limitations
Midjourney currently lacks any semblance of a user-friendly experience and interface. You can currently only use it via Discord — an instant messaging, social platform. Communication between users and Midjourney bots on Discord takes place in virtual communities called “servers”. This can be quite tedious to navigate.
Midjourney violates several key usability heuristics such as:
- ‘Match Between the System and the Real World’ where information doesn’t appear in a natural and logical manner. The U1, U2, U3, U4, and V1, V2, V3, V4 buttons are confusing. They should ideally appear beneath each image, reducing the cognitive load on users. Midjourney forces users to interact with it in system jargon using ‘/’ before prompts, etc. The design should ideally speak the users’ language and not the other way around.
- ‘User Control and Freedom’ – Midjourney allows for the public visibility of all generated images. This might not work for many people and takes away control and privacy from a user.
- ‘Flexibility and Efficiency of Use’ – The user is flooded with images, prompts and responses from people all over the world using Midjourney on Discord. This makes the user get lost and they’re unable to find their images. It forces them to scroll endlessly and waste time, as it is an iterative and evolving process to generate images.
- ‘Error Prevention and Recovering from errors’ – The user lacking control and being unable to prevent and rectify errors, never makes a good product. It is quite hard to make amends to a prompt. Once you have entered it you cannot change it, you have to re-type the prompt with the amends.
Midjourney is still in its nascent stages, and it will only get better with time. Midjourney is working on a new platform and interface and we hope it will be better than currently available products.
We tested AI generated images
We wanted to see what we could generate, then analyse results to understand the platform’s and images’ merits and limitations. A lot of the work Corporation Pop does is for music festivals. To test it, we got Midjourney to churn out these images using multiple iterations of the same prompt. We chose ‘Music festival in the UK, nighttime, photograph’ as a test prompt. We primarily wanted to generate photographs of festivals and festival-goers in the UK.
At first glance, the images seem very impressive. It would have taken a lot of time, money and effort to have generated these manually.
On a closer look, however, Midjourney seemed to work better when it was generating crowds of people or ‘silhouettes’. It also did a decent job at generating close-ups or portraits of people.
However, despite its impressive capabilities, when generating groups of people or multiple people in a frame, it messed up.
The platform seems to really struggle when creating hands and feet. Midjourney’s algorithms can handle other body parts pretty well. However, when it comes to hands, feet and even eyes sometimes, the results are often less than satisfactory. But, it sounds like this is an area that developers are looking to improve as the platform continues to develop.
A powerful tool
Midjourney is a powerful tool for generating AI-based images, but it’s not for everyone and it’s not suitable for everything. We now know its limitations and can apply it and use it accordingly.
It can definitely help us by churning out concept images, renders, and so on. This can help us visualise and quickly decide whether to keep or discard ideas.
Despite its limitations and usability issues, it can still be an incredibly useful tool and addition to our creative toolkit.
AI image generation is rapidly evolving, it’s incredible to see how far AI has come. As more people make use of it, there is more data for developers to train and refine their models. We think it’s only going to get better from here on.