NVIDIA generative AI research turns text to 3D objects

22nd March 2024

NVIDIA

Paige West

0 0

NVIDIA's latest foray into generative artificial intelligence has yielded LATTE3D, a text-to-3D AI model that heralds a new era in digital creation.

The model has the capability to transform textual prompts into detailed 3D representations of objects and animals in mere seconds.

Developed by the Toronto-based AI lab team led by Sanja Fidler, NVIDIA's Vice President of AI Research, LATTE3D has significantly accelerated the process of generating high-quality 3D visuals. "A year ago, it took an hour for AI models to generate 3D visuals of this quality – and the current state of the art is now around 10 to 12 seconds," Fidler stated. "We can now produce results an order of magnitude faster, putting near-real-time text-to-3D generation within reach for creators across industries."

This technological leap is made possible by running inference on powerful GPUs such as the NVIDIA RTX A6000, enabling almost instantaneous production of 3D shapes. LATTE3D is designed to integrate seamlessly into virtual environments, making it an invaluable tool for a wide range of applications, from video game development and advertising campaigns to design projects and virtual robotics training.

The real innovation lies in LATTE3D's capacity to ideate, generate, and iterate at unprecedented speeds. Creators can quickly bring their visions to life without starting from scratch or sifting through extensive 3D asset libraries. The AI model provides multiple 3D shape options based on a single text prompt, allowing for rapid selection and optimisation of objects. These can then be exported to various graphics software applications or platforms such as NVIDIA Omniverse, which supports Universal Scene Description (OpenUSD)-based 3D workflows.

While currently trained on datasets of animals and everyday objects, LATTE3D's versatile architecture means it can be adapted to other data types. This flexibility opens up possibilities for a range of professionals, from landscape designers looking to populate garden renderings to developers creating realistic environments for training personal assistant robots.

The model's training was conducted using NVIDIA A100 Tensor Core GPUs, with diverse text prompts generated by ChatGPT to enhance the AI's understanding of user inputs. This approach ensures LATTE3D can accurately interpret a wide array of descriptions, translating them into precise 3D representations.