Skip to content

πŸ“ Text-to-3D: Generate 3D Assets from Text

Create physically plausible 3D assets from text descriptions, supporting a wide range of geometry, style, and material details.


⚑ Command-Line Usage

Basic CLI(recommend)

Text-to-image model based on Stable Diffusion 3.5 Medium, English prompts only. Usage requires agreement to the model license (click β€œAccept”).

text3d-cli \
  --prompts "small bronze figurine of a lion" "A globe with wooden base" "wooden table with embroidery" \
  --n_image_retry 1 \
  --n_asset_retry 1 \
  --n_pipe_retry 1 \
  --seed_img 0 \
  --output_root outputs/textto3d
  • --n_image_retry: Number of retries per prompt for text-to-image generation
  • --n_asset_retry: Retry attempts for image-to-3D assets generation
  • --n_pipe_retry: Pipeline retry for end-to-end 3D asset quality check
  • --seed_img: Optional initial seed image for style guidance
  • --output_root: Directory to save generated assets

For large-scale 3D asset generation, set --n_image_retry=4 --n_asset_retry=3 --n_pipe_retry=2, slower but better, via automatic checking and retries. For more diverse results, omit --seed_img.

You will get the following results:

"small bronze figurine of a lion"

"A globe with wooden base"

"wooden table with embroidery"


Kolors Model CLI (Supports Chinese & English Prompts):

bash embodied_gen/scripts/textto3d.sh \
  --prompts "small bronze figurine of a lion" "A globe with wooden base and latitude and longitude lines" "ζ©™θ‰²η”΅εŠ¨ζ‰‹ι’»οΌŒζœ‰η£¨ζŸη»†θŠ‚" \
  --output_root outputs/textto3d_k

Models with more permissive licenses can be found in embodied_gen/models/image_comm_model.py.

The generated results are organized as follows:

outputs/textto3d
β”œβ”€β”€ asset3d
β”‚   β”œβ”€β”€ sample3d_xx
β”‚   β”‚   └── result
β”‚   β”‚       β”œβ”€β”€ mesh
β”‚   β”‚       β”‚   β”œβ”€β”€ material_0.png
β”‚   β”‚       β”‚   β”œβ”€β”€ material.mtl
β”‚   β”‚       β”‚   β”œβ”€β”€ sample3d_xx_collision.obj
β”‚   β”‚       β”‚   β”œβ”€β”€ sample3d_xx.glb
β”‚   β”‚       β”‚   β”œβ”€β”€ sample3d_xx_gs.ply
β”‚   β”‚       β”‚   └── sample3d_xx.obj
β”‚   β”‚       β”œβ”€β”€ sample3d_xx.urdf
β”‚   β”‚       └── video.mp4
└── images
    β”œβ”€β”€ sample3d_xx.png
    β”œβ”€β”€ sample3d_xx_raw.png

  • mesh/ β†’ 3D geometry and texture files for the asset, including visual mesh, collision mesh and 3DGS
  • *.urdf β†’ Simulator-ready URDF including collision and visual meshes
  • video.mp4 β†’ Preview video of the generated 3D asset
  • images/sample3d_xx.png β†’ Foreground-extracted image used for image-to-3D step
  • images/sample3d_xx_raw.png β†’ Original generated image from the text-to-image step

Getting Started