📝 Text-to-3D — Generate 3D Assets from Text

Create physically plausible 3D assets from text descriptions, supporting a wide range of geometry, style, and material details.

⚡ Command-Line Usage

Basic CLI(recommend)

Text-to-image model based on Stable Diffusion 3.5 Medium， English prompts only. Usage requires agreement to the model license (click “Accept”).

text3d-cli \
  --prompts "small bronze figurine of a lion" "A globe with wooden base" "wooden table with embroidery" \
  --n_image_retry 1 \
  --n_asset_retry 1 \
  --n_pipe_retry 1 \
  --seed_img 0 \
  --output_root outputs/textto3d

--n_image_retry: Number of retries per prompt for text-to-image generation
--n_asset_retry: Retry attempts for image-to-3D assets generation
--n_pipe_retry: Pipeline retry for end-to-end 3D asset quality check
--seed_img: Optional initial seed image for style guidance
--output_root: Directory to save generated assets

For large-scale 3D asset generation, set --n_image_retry=4 --n_asset_retry=3 --n_pipe_retry=2, slower but better, via automatic checking and retries. For more diverse results, omit --seed_img.

You will get the following results:

"small bronze figurine of a lion"

"A globe with wooden base"

"wooden table with embroidery"

Kolors Model CLI (Supports Chinese & English Prompts):

bash embodied_gen/scripts/textto3d.sh \
    --prompts "A globe with wooden base and latitude and longitude lines" "橙色电动手钻，有磨损细节" \
    --output_root outputs/textto3d_k

Models with more permissive licenses can be found in embodied_gen/models/image_comm_model.py.

Choosing the 3D Backend

Three 3D generation backends are supported via --image3d_model (case-insensitive):

SAM3D (default) — text → image → 3D, local SAM3D model
TRELLIS — text → image → 3D, local TRELLIS model
HUNYUAN3D — Tencent Hunyuan3D Pro text-to-3D API; skips the text-to-image stage entirely and generates 3D directly from the prompt

Using the Hunyuan3D Cloud Backend

Hunyuan3D Pro takes the prompt directly to a 3D mesh (no GPU model loaded locally; one job ≈ 3 minutes; Tencent Cloud is billed per submit). Set up credentials once:

export TENCENT_SECRET_ID='your-secret-id'
export TENCENT_SECRET_KEY='your-secret-key'
text3d-cli --image3d_model HUNYUAN3D \
  --prompts "small bronze figurine of a lion" \
  --output_root outputs/textto3d_hunyuan

The generated results are organized as follows:

outputs/textto3d
├── asset3d
│   ├── sample3d_xx
│   │   └── result
│   │       ├── mesh
│   │       │   ├── material_0.png
│   │       │   ├── material.mtl
│   │       │   ├── sample3d_xx_collision.obj
│   │       │   ├── sample3d_xx.glb
│   │       │   ├── sample3d_xx_gs.ply
│   │       │   └── sample3d_xx.obj
│   │       ├── sample3d_xx.urdf
│   │       └── video.mp4
└── images
    ├── sample3d_xx.png
    ├── sample3d_xx_raw.png

mesh/ → 3D geometry and texture files for the asset, including visual mesh, collision mesh and 3DGS
*.urdf → Simulator-ready URDF including collision and visual meshes
video.mp4 → Preview video of the generated 3D asset
images/sample3d_xx.png → Foreground-extracted image used for image-to-3D step
images/sample3d_xx_raw.png → Original generated image from the text-to-image step

Getting Started

You can also try Text-to-3D instantly online via our Hugging Face Space — no installation required.
Explore EmbodiedGen generated sim-ready Assets Gallery.
For instructions on using the generated asset in any simulator, see Any Simulators Tutorial.