NVIDIA advances physical AI with largest indoor synthetic dataset
NVIDIA contributed the largest ever indoor synthetic dataset to the Computer Vision and Pattern Recognition (CVPR) conference’s annual AI City Challenge – helping researchers and developers advance the development of solutions for smart cities and industrial automation.
The challenge, garnering over 700 teams from nearly 50 countries, tasks participants to develop AI models to enhance operational efficiency in physical settings, such as retail and warehouse environments, and intelligent traffic systems.
Teams tested their models on the datasets that were generated using NVIDIA Omniverse, a platform of application programming interfaces (APIs), software development kits (SDKs) and services that enable developers to build Universal Scene Description (OpenUSD)-based applications and workflows.
Creating and simulating digital twins for large spaces
In large indoor spaces like factories and warehouses, daily activities involve a steady stream of people, small vehicles and future autonomous robots. Developers need solutions that can observe and measure activities, optimise operational efficiency, and prioritise human safety in complex, large-scale settings.
Researchers are addressing that need with computer vision models that can perceive and understand the physical world. It can be used in applications like multi-camera tracking, in which a model tracks multiple entities within a given environment.
To ensure their accuracy, the models must be trained on large, ground-truth datasets for a variety of real-world scenarios. But collecting that data can be a challenging, time-consuming and costly process.
AI researchers are turning to physically based simulations – such as digital twins of the physical world – to enhance AI simulation and training. These virtual environments can help generate synthetic data used to train AI models. Simulation also provides a way to run a multitude of “what-if” scenarios in a safe environment while addressing privacy and AI bias issues.
Creating synthetic data is important for AI training because it offers a large, scalable, and expandable amount of data. Teams can generate a diverse set of training data by changing many parameters including lighting, object locations, texturesm, and colours.
Building synthetic datasets for the AI City Challenge
This year’s AI City Challenge consists of five computer vision challenge tracks that span traffic management to worker safety.
NVIDIA contributed datasets for the first track, Multi-Camera Person Tracking, which saw the highest participation, with over 400 teams. The challenge used a benchmark and the largest synthetic dataset of its kind — comprising 212 hours of 1080p videos at 30 frames per second spanning 90 scenes across six virtual environments, including a warehouse, retail store and hospital.
Created in Omniverse, these scenes simulated nearly 1,000 cameras and featured around 2,500 digital human characters. It also provided a way for the researchers to generate data of the right size and fidelity to achieve the desired outcomes.
The benchmarks were created using Omniverse Replicator in NVIDIA Isaac Sim, a reference application that enables developers to design, simulate and train AI for robots, smart spaces or autonomous machines in physically based virtual environments built on NVIDIA Omniverse.
Omniverse Replicator, an SDK for building synthetic data generation pipelines, automated many manual tasks involved in generating quality synthetic data, including domain randomisation, camera and calibration, character movement, and semantic labelling of data, and ground-truth for benchmarking. Ten institutions and organisations are collaborating with NVIDIA for the AI City Challenge:
- Australian National University, Australia
- Emirates Center for Mobility Research, UAE
- Indian Institute of Technology Kanpur, India
- Iowa State University, U.S.
- Johns Hopkins University, U.S.
- National Yung-Ming Chiao-Tung University, Taiwan
- Santa Clara University, U.S.
- The United Arab Emirates University, UAE
- University at Albany – SUNY, U.S.
- Woven by Toyota, Japan
- Driving the future of generative physical AI
Researchers and companies around the world are developing infrastructure automation and robots powered by physical AI – which are models that can understand instructions and autonomously perform complex tasks in the real world.
Generative physical AI uses reinforcement learning in simulated environments, where it perceives the world using accurately simulated sensors, performs actions grounded by laws of physics, and receives feedback to reason about the next set of actions.
Developers can tap into developer SDKs and APIs, such as the NVIDIA Metropolis developer stack — which includes a multi-camera tracking reference workflow — to add enhanced perception capabilities for factories, warehouses, and retail operations. And with the latest release of NVIDIA Isaac Sim, developers can supercharge robotics workflows by simulating and training AI-based robots in physically based virtual spaces before real-world deployment.
Researchers and developers are also combining high-fidelity, physics-based simulation with advanced AI to bridge the gap between simulated training and real-world application. This helps ensure that synthetic training environments closely mimic real-world conditions for more seamless robot deployment.
NVIDIA is taking the accuracy and scale of simulations further with the recently announced NVIDIA Omniverse Cloud Sensor RTX, a new set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines.
This technology will allow autonomous systems, whether a factory, vehicle, or robot, to gather essential data to effectively perceive, navigate and interact with the real world. Using these microservices, developers can run large-scale tests on sensor perception within realistic, virtual environments, significantly reducing the time and cost associated with real-world testing.
Showcasing advanced AI with research
At CVPR 2024, NVIDIA Research will present over 50 papers, introducing generative physical AI breakthroughs with potential applications in areas like autonomous vehicle development and robotics.
Papers that used the NVIDIA Omniverse platform to generate synthetic data or digital twins of environments for model simulation, testing and validation include:
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects: FoundationPose is a versatile model for estimating and tracking the 3D position and orientation of objects. It works with any new object immediately if a computer-aided design model or a few reference images are available, using advanced techniques to handle different scenarios effectively.
Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects: This research paper presents a method for creating digital models of objects from two 3D scans, improving accuracy by analysing how movable parts connect and move between positions.
BEHAVIOR Vision Suite: Customisable Dataset Generation via Simulation: The BEHAVIOR Vision Suite generates customisable synthetic data for computer vision research, allowing researchers to adjust settings like lighting and object placement.