Logo

Towards a Generative 3D World Engine for Embodied Intelligence

Xinjie Wang1, Liu Liu1, Yu Cao2, Ruiqi Wu5,1, Wenkang Qin2,

Dehui Wang4,3, Wei Sui3, Zhizhong Su1
1Horizon Robotics, 2GigaAI, 3D-Robotics, 4Shanghai Jiao Tong University, 5VCIP, CS, Nankai University
Video thumbnail

EmbodiedGen generates interactive 3D worlds with real-world scale and physical realism at low cost.

Framework

Framework Overview

EmbodiedGen is a toolkit to generate diverse and interactive 3D worlds composed of generative 3D assets with plausible physics, leveraging generative AI to address the challenges of generalization in embodied intelligence related research. EmbodiedGen composed of six key modules: Image-to-3D, Text-to-3D, Texture Generation, Articulated Object Generation, Scene Generation and Layout Generation.

Results

Image-to-3D

Text-to-3D

"Antique brass key, intricate filigree"

"Rusty old wrench, peeling paint"

"Sleek black drone, red sensors"

"Miniature screwdriver with bright orange handle"

"European style wooden dressing table"

Texture Generation

Articulated Object Generation

Scene Generation

BibTeX

@misc{xinjie2025embodiedgengenerative3dworld,
      title={EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence}, 
      author={Wang Xinjie and Liu Liu and Cao Yu and Wu Ruiqi and Qin Wenkang and Wang Dehui and Sui Wei and Su Zhizhong},
      year={2025},
      eprint={2506.10600},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2506.10600}, 
}