Realtime generative AI art is here thanks to LCM-LoRA

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.


Generative AI art has quickly emerged as one of the most interesting and popular applications of the new technology, with models such as Stable Diffusion and Midjourney claiming millions of users, not to mention OpenAI’s move to bundle its DALL-E 3 image generation model directly into its popular ChatGPT service earlier this fall. Simply by typing in a description and waiting a few short moments, users can see an image from their imagination rendered on screen by AI algorithms trained to do exactly that.

Yet, the fact that the user has to wait those “few short moments,” anywhere between a second or two to minutes for the AI to generate their image, is not ideal for our fast-paced, instant gratification modern world.

That’s why this week, the online AI art community is collectively freaking out about a new machine learning technique — LCM-LoRA, short for “Latent Consistency Model- Low-Rank Adaptation” developed by researchers at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University in China and the AI code sharing platform HuggingFace, and described in a paper published on the pre-review open access research site arXiv.org — that finally brings generative AI art creation into realtime.

What does this mean, in a practical sense? Well, take a look at some of the videos shared by AI artists on X and LinkedIn below, and you’ll get an idea.

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

 

Learn More

Essentially, thanks to the LCM-LoRA technique, users can now move their cursors or paint simple, almost stick-figure like drawings or apply just a few shapes, alongside descriptive text, and AI art creation applications such as Krea.AI and Fal.AI will automatically render different, new, generated art instantaneously, even swapping out the imagery in fractions of a second as the user moves their shapes or paints simple lines on their digital canvas.

You can try it for yourself here at Fal.AI (permitting it stays up with increased use).

The technique works not only for flat, 2D images, but 3D assets as well, meaning artists could theoretically quickly create immersive environments instantly for use in mixed reality (AR/VR/XR), computer and video games, and other experiences. Theoretically, they could also be used in films, as well, drastically speeding up and reducing the costs of production.

“Everything is going to change,” commented one startup founder and former Google AI engineer on LinkedIn, about LCM-LoRA, a sentiment echoed by many in the AI arts community.

“A whole new era of generative AI is about to be unleashed,” commented another user on X.

University of Pennsylvania Wharton School of Business professor Ethan Mollick, one of the most active and vocal influencers and proponents of generative AI, opined that “we are going to see a lot of new user experiences soon,” thanks to the technique.

What is LCM-LoRA and how does it work?

The early demos of LCM-LoRA integrations into apps are undeniably captivating and do suggest to this author at VentureBeat/AI artist, to be a new watershed moment for generative AI in visual arts.

But what is the technological advancement at the heart of LCM-LoRA and can it scale across apps and different uses, as the early users imply?

According to the paper describing the technique published by researchers at IIIS Tsinghua University and HuggingFace, LCM-LoRA is ultimately a “universal training-free acceleration module that can be directly plugged into various Stable Diffusion fine-tuned models or SD LoRAs.”

It’s a mouthful for anyone not in the machine learning community, but to decode it into more layperson English, it’s essentially an algorithm that speeds up the process of turning text or source imagery into new AI generated artwork using the popular open-source Stable Diffusion AI model, and its fine-tuned, or altered, variants.

LCM-LoRA does this by reducing the number of “required sampling steps,” that is, processes the AI model must undergo to transform the source text or image — whether it be a description or a stick figure — into a higher-quality, higher-detailed image based on the learnings of the Stable Diffusion model from millions of images.

This means LCM-LoRA allows Stable Diffusion models to work faster, with fewer computational resources, so they don’t need to take up as much working memory or cycles on a person’s computer. This is what enables them to produce eye-popping results in realtime.

The fact that it is “universal,” means it can be plugged into a variety of apps that rely on Stable Diffusion or its variants to generate imagery. Whether it can be extended beyond Stable Diffusion, to proprietary models like OpenAI’s DALL-E 3 or Midjourney, remains to be seen.

We’ve reached out to one of the LCM-LoRA paper authors and will update this piece from them with more information when we hear back.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


source
share

Leave a Comment