Genie 2 from DeepMind can create interactive environments that resemble video games

  • 04-December-2024

A model developed by Google's AI research group DeepMind can produce a "endless" number of playable 3D worlds.

A single image and text description (e.g., “A cute humanoid robot in the woods”) can be used to create an interactive, real-time scene using a model called Genie 2, which is the replacement for DeepMind’s Genie, which was launched earlier this year. This makes it comparable to models being developed by Israeli firm Decart and Fei-Fei Li's company, World Labs.

According to DeepMind, Genie 2 can create a "vast diversity of rich 3D worlds," including ones where users can utilize a mouse or keyboard to perform movements like hopping and swimming. The model can replicate object interactions, animations, lighting, physics, reflections, and "NPC" behavior because it was trained on videos.

It's possible that the model's training data includes playthroughs of well-known games, which explains why many of Genie 2's simulations resemble AAA video games. However, for competitive or other reasons, DeepMind, like many AI laboratories, would not divulge many specifics about how it sources its data.

The IP implications are a matter of concern. Google has previously said that its ToS permits DeepMind, a Google company, to use YouTube videos for model training, and DeepMind has unrestricted access to YouTube. Is Genie 2 essentially making illegal copies of the video games it "watched," though? It is up to the courts to decide that.

According to DeepMind, Genie 2 can produce consistent worlds with a variety of viewpoints, such as isometric and first-person views, for as long as a minute, most of which last 10 to 20 seconds.

“Genie 2 responds intelligently to actions taken by pressing keys on a keyboard, identifying the character and moving it correctly,” DeepMind wrote in a blog post. “For example, our model [can] figure out that arrow keys should move a robot and not trees or clouds.”

The majority of models, such as Genie 2 or "world models," can replicate games and 3D worlds, but they have problems with consistency, artifacting, and hallucinations. For instance, Oasis, Decart's Minecraft simulator, has a low resolution and "forgets" the level layout quite easily.

On the other hand, Genie 2 is able to recall portions of a simulated scene that are not visible and render them precisely when they are.

Now, games made with Genie 2 wouldn't be that enjoyable because your progress would be erased every minute or so. For this reason, DeepMind is presenting the model as more of a research and creative tool, one that can be used to test AI agents and prototype "interactive experiences."

“Thanks to Genie 2’s out-of-distribution generalization capabilities, concept art and drawings can be turned into fully interactive environments,” DeepMind wrote. “And by using Genie 2 to quickly create rich and diverse environments for AI agents, our researchers can generate evaluation tasks that agents have not seen during training.”

There may be mixed feelings among creatives, especially those in the video game industry. Major companies like Activision Blizzard, which has fired dozens of employees, are utilizing AI to reduce costs, boost output, and make up for attrition, according to a new Wired study.

However, as the next big thing in AI, Google has been investing more and more resources in its world model research. To work on world simulators and video production technologies, DeepMind hired Tim Brooks in October. Brooks was leading development on OpenAI's Sora video generator at the time. Additionally, the lab stole Tim Rocktäschel from Meta two years ago. He is most recognized for his "open-endedness" experiments with video games like NetHack.

Related Post

Google publishes the second Developer Preview vers..

Google has already made available the second Devel..

Finally available for the newest Nothing phones ru..

The stable Nothing OS 3.0, which is based on Andro..