Generative AI Terrains in Simulations
Unleashing the Power of Generative AI for Terrain Generation in Games
Creating realistic and engaging terrain for video games has always been a challenge for game developers. Traditionally, terrain generation has relied on procedural algorithms and manual artist input to design in-game landscapes. However, recent advancements in artificial intelligence (AI) have opened the door to a new world of possibilities for terrain generation. In this blog post, we'll explore how generative AI is revolutionizing the process of terrain creation, enabling the development of more complex, diverse, and immersive environments for players to explore.
What is Generative AI?
Generative AI refers to a class of artificial intelligence algorithms that are designed to create new content or data based on existing examples. These algorithms learn patterns and structures from the input data and then generate new instances that exhibit similar properties. Among the most popular generative AI techniques are Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs) and systems that build upon VAEs like Stable Diffusion (SD), which have shown remarkable success in generating realistic images, audio, and even 3D models from text inputs where a user simply describes what they are looking for and the model generates pretty amazing results.
Heightmaps
Heightmaps are typically created using specialized software or tools that allow developers to sculpt the height values for each pixel so that the terrain fits the needs of the game. In Unity and Unreal terrain systems, you can also use “stamps” which are essentially brushes for terrain. Other tools allow a rough outline of the landscape to be created and then it’s procedurally eroded to look more realistic.
The problem with these tools is they are not very accessible and it often takes a proper 3D artist to create a satisfactory result. As one step in our goal to democratize the creation of 3D experiences, we leveraged the amazing work in AI image generation and applied it to creating terrains.
Training AI to Generate Terrain Heightmaps
Starting with a training set of ~600 terrains of various styles, we needed to convert them to a format that Stable Diffusion could work with. That meant taking the 4K 16bit single channel terrains and converting them to 512x512 RGB images (8 bits per channel). The conversion process meant the loss of a lot of detail, but it wasn’t clear how well the underlying VAE would work with these rather strange images in the first place so simply converting made sense to start with.
Using the Stable Diffusion WebUI running locally with a RTX 3090, we trained a custom embedding on SD-1.5 so that at generation time, we could instruct Stable Diffusion to only generate terrain images. The process we used for this is called “Textual Inversion”.
For more information on this process, especially using the handy stable-diffusion-webUI project, we recommend checking out their documentation.
The beauty of this method is that instead of retraining the network itself, you are simply finding the right set of embedding vectors to represent the concept you want to concentrate on. That means the resulting embedding only takes up 25k instead of a few gigabytes.