Utilities API
General-purpose utility functions, configuration, and helper classes.
embodied_gen.utils.config
embodied_gen.utils.log
embodied_gen.utils.enum
AssetType
dataclass
AssetType()
Bases: str
Enumeration for asset types.
Supported types
MJCF: MuJoCo XML format. USD: Universal Scene Description format. URDF: Unified Robot Description Format. MESH: Mesh file format.
LayoutInfo
dataclass
LayoutInfo(tree: dict[str, list], relation: dict[str, str | list[str]], objs_desc: dict[str, str] = dict(), objs_mapping: dict[str, str] = dict(), assets: dict[str, str] = dict(), quality: dict[str, str] = dict(), position: dict[str, list[float]] = dict())
Bases: DataClassJsonMixin
Data structure for layout information in a 3D scene.
Attributes:
| Name | Type | Description |
|---|---|---|
tree |
dict[str, list]
|
Hierarchical structure of scene objects. |
relation |
dict[str, str | list[str]]
|
Spatial relations between objects. |
objs_desc |
dict[str, str]
|
Descriptions of objects. |
objs_mapping |
dict[str, str]
|
Mapping from object names to categories. |
assets |
dict[str, str]
|
Asset file paths for objects. |
quality |
dict[str, str]
|
Quality information for assets. |
position |
dict[str, list[float]]
|
Position coordinates for objects. |
RenderItems
dataclass
RenderItems()
Bases: str, Enum
Enumeration of render item types for 3D scenes.
Attributes:
| Name | Type | Description |
|---|---|---|
IMAGE |
Color image. |
|
ALPHA |
Mask image. |
|
VIEW_NORMAL |
View-space normal image. |
|
GLOBAL_NORMAL |
World-space normal image. |
|
POSITION_MAP |
Position map image. |
|
DEPTH |
Depth image. |
|
ALBEDO |
Albedo image. |
|
DIFFUSE |
Diffuse image. |
RobotItemEnum
dataclass
RobotItemEnum()
Bases: str, Enum
Enumeration of supported robot types.
Attributes:
| Name | Type | Description |
|---|---|---|
FRANKA |
Franka robot. |
|
UR5 |
UR5 robot. |
|
PIPER |
Piper robot. |
Scene3DItemEnum
dataclass
Scene3DItemEnum()
Bases: str, Enum
Enumeration of 3D scene item categories.
Attributes:
| Name | Type | Description |
|---|---|---|
BACKGROUND |
Background objects. |
|
CONTEXT |
Contextual objects. |
|
ROBOT |
Robot entity. |
|
MANIPULATED_OBJS |
Objects manipulated by the robot. |
|
DISTRACTOR_OBJS |
Distractor objects. |
|
OTHERS |
Other objects. |
Methods:
| Name | Description |
|---|---|
object_list |
Returns a list of objects in the scene. |
object_mapping |
Returns a mapping from object to category. |
object_list
classmethod
object_list(layout_relation: dict) -> list
Returns a list of objects in the scene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layout_relation
|
dict
|
Dictionary mapping categories to objects. |
required |
Returns:
| Type | Description |
|---|---|
list
|
List of objects in the scene. |
Source code in embodied_gen/utils/enum.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
object_mapping
classmethod
object_mapping(layout_relation)
Returns a mapping from object to category.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layout_relation
|
Dictionary mapping categories to objects. |
required |
Returns:
| Type | Description |
|---|---|
|
Dictionary mapping object names to their category. |
Source code in embodied_gen/utils/enum.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
SimAssetMapper
Maps simulator names to asset types.
Provides a mapping from simulator names to their corresponding asset type.
Example
from embodied_gen.utils.enum import SimAssetMapper
asset_type = SimAssetMapper["isaacsim"]
print(asset_type) # Output: 'usd'
Methods:
| Name | Description |
|---|---|
__class_getitem__ |
Returns the asset type for a given simulator name. |
__class_getitem__
classmethod
__class_getitem__(key: str)
Returns the asset type for a given simulator name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Name of the simulator. |
required |
Returns:
| Type | Description |
|---|---|
|
AssetType corresponding to the simulator. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If the simulator name is not recognized. |
Source code in embodied_gen/utils/enum.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
SpatialRelationEnum
dataclass
SpatialRelationEnum()
Bases: str, Enum
Enumeration of spatial relations for objects in a scene.
Attributes:
| Name | Type | Description |
|---|---|---|
ON |
Objects on a surface (e.g., table). |
|
IN |
Objects in a container or room. |
|
INSIDE |
Objects inside a shelf or rack. |
|
FLOOR |
Objects on the floor. |
embodied_gen.utils.geometry
all_corners_inside
all_corners_inside(hull: Path, box: list, threshold: int = 3) -> bool
Checks if at least threshold corners of a box are inside a hull.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hull
|
Path
|
Convex hull path. |
required |
box
|
list
|
Box coordinates [x1, x2, y1, y2]. |
required |
threshold
|
int
|
Minimum corners inside. |
3
|
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if enough corners are inside. |
Source code in embodied_gen/utils/geometry.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 | |
bfs_placement
bfs_placement(layout_file: str, floor_margin: float = 0, beside_margin: float = 0.1, max_attempts: int = 3000, init_rpy: tuple = (1.5708, 0.0, 0.0), rotate_objs: bool = True, rotate_bg: bool = True, rotate_context: bool = True, limit_reach_range: tuple[float, float] | None = (0.2, 0.85), max_orient_diff: float | None = 60, robot_dim: float = 0.12, seed: int = None) -> LayoutInfo
Places objects in a scene layout using BFS traversal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layout_file
|
str
|
Path to layout JSON file generated from |
required |
floor_margin
|
float
|
Z-offset for objects placed on the floor. |
0
|
beside_margin
|
float
|
Minimum margin for objects placed 'beside' their parent, used when 'on' placement fails. |
0.1
|
max_attempts
|
int
|
Max attempts for a non-overlapping placement. |
3000
|
init_rpy
|
tuple
|
Initial rotation (rpy). |
(1.5708, 0.0, 0.0)
|
rotate_objs
|
bool
|
Whether to random rotate objects. |
True
|
rotate_bg
|
bool
|
Whether to random rotate background. |
True
|
rotate_context
|
bool
|
Whether to random rotate context asset. |
True
|
limit_reach_range
|
tuple[float, float] | None
|
If set, enforce a check that manipulated objects are within the robot's reach range, in meter. |
(0.2, 0.85)
|
max_orient_diff
|
float | None
|
If set, enforce a check that manipulated objects are within the robot's orientation range, in degree. |
60
|
robot_dim
|
float
|
The approximate robot size. |
0.12
|
seed
|
int
|
Random seed for reproducible placement. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LayoutInfo |
LayoutInfo
|
Layout information with object poses. |
Example
from embodied_gen.utils.geometry import bfs_placement
layout = bfs_placement("scene_layout.json", seed=42)
print(layout.position)
Source code in embodied_gen/utils/geometry.py
314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 | |
check_reachable
check_reachable(base_xyz: ndarray, reach_xyz: ndarray, min_reach: float = 0.25, max_reach: float = 0.85) -> bool
Checks if the target point is within the reachable range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_xyz
|
ndarray
|
Base position. |
required |
reach_xyz
|
ndarray
|
Target position. |
required |
min_reach
|
float
|
Minimum reach distance. |
0.25
|
max_reach
|
float
|
Maximum reach distance. |
0.85
|
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if reachable, False otherwise. |
Source code in embodied_gen/utils/geometry.py
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 | |
compose_mesh_scene
compose_mesh_scene(layout_info: LayoutInfo, out_scene_path: str, with_bg: bool = False) -> None
Composes a mesh scene from layout information and saves to file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layout_info
|
LayoutInfo
|
Layout information. |
required |
out_scene_path
|
str
|
Output scene file path. |
required |
with_bg
|
bool
|
Include background mesh. |
False
|
Source code in embodied_gen/utils/geometry.py
571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 | |
compute_axis_rotation_quat
compute_axis_rotation_quat(axis: Literal['x', 'y', 'z'], angle_rad: float) -> list[float]
Computes quaternion for rotation around a given axis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
axis
|
Literal['x', 'y', 'z']
|
Axis of rotation. |
required |
angle_rad
|
float
|
Rotation angle in radians. |
required |
Returns:
| Type | Description |
|---|---|
list[float]
|
list[float]: Quaternion [x, y, z, w]. |
Source code in embodied_gen/utils/geometry.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | |
compute_convex_hull_path
compute_convex_hull_path(vertices: ndarray, z_threshold: float = 0.05, interp_per_edge: int = 10, margin: float = -0.02, x_axis: int = 0, y_axis: int = 1, z_axis: int = 2) -> Path
Computes a dense convex hull path for the top surface of a mesh.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vertices
|
ndarray
|
Mesh vertices. |
required |
z_threshold
|
float
|
Z threshold for top surface. |
0.05
|
interp_per_edge
|
int
|
Interpolation points per edge. |
10
|
margin
|
float
|
Margin for polygon buffer. |
-0.02
|
x_axis
|
int
|
X axis index. |
0
|
y_axis
|
int
|
Y axis index. |
1
|
z_axis
|
int
|
Z axis index. |
2
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Matplotlib path object for the convex hull. |
Source code in embodied_gen/utils/geometry.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
compute_pinhole_intrinsics
compute_pinhole_intrinsics(image_w: int, image_h: int, fov_deg: float) -> np.ndarray
Computes pinhole camera intrinsic matrix from image size and FOV.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_w
|
int
|
Image width. |
required |
image_h
|
int
|
Image height. |
required |
fov_deg
|
float
|
Field of view in degrees. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Intrinsic matrix K. |
Source code in embodied_gen/utils/geometry.py
605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 | |
compute_xy_bbox
compute_xy_bbox(vertices: ndarray, col_x: int = 0, col_y: int = 1) -> list[float]
Computes the bounding box in XY plane for given vertices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vertices
|
ndarray
|
Vertex coordinates. |
required |
col_x
|
int
|
Column index for X. |
0
|
col_y
|
int
|
Column index for Y. |
1
|
Returns:
| Type | Description |
|---|---|
list[float]
|
list[float]: [min_x, max_x, min_y, max_y] |
Source code in embodied_gen/utils/geometry.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
find_parent_node
find_parent_node(node: str, tree: dict) -> str | None
Finds the parent node of a given node in a tree.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Node name. |
required |
tree
|
dict
|
Tree structure. |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
str | None: Parent node name or None. |
Source code in embodied_gen/utils/geometry.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 | |
has_iou_conflict
has_iou_conflict(new_box: list[float], placed_boxes: list[list[float]], iou_threshold: float = 0.0) -> bool
Checks for intersection-over-union conflict between boxes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_box
|
list[float]
|
New box coordinates. |
required |
placed_boxes
|
list[list[float]]
|
List of placed box coordinates. |
required |
iou_threshold
|
float
|
IOU threshold. |
0.0
|
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if conflict exists, False otherwise. |
Source code in embodied_gen/utils/geometry.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
matrix_to_pose
matrix_to_pose(matrix: ndarray) -> list[float]
Converts a 4x4 transformation matrix to a pose (x, y, z, qx, qy, qz, qw).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
matrix
|
ndarray
|
4x4 transformation matrix. |
required |
Returns:
| Type | Description |
|---|---|
list[float]
|
list[float]: Pose as [x, y, z, qx, qy, qz, qw]. |
Source code in embodied_gen/utils/geometry.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
pose_to_matrix
pose_to_matrix(pose: list[float]) -> np.ndarray
Converts pose (x, y, z, qx, qy, qz, qw) to a 4x4 transformation matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pose
|
list[float]
|
Pose as [x, y, z, qx, qy, qz, qw]. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: 4x4 transformation matrix. |
Source code in embodied_gen/utils/geometry.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
quaternion_multiply
quaternion_multiply(init_quat: list[float], rotate_quat: list[float]) -> list[float]
Multiplies two quaternions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
init_quat
|
list[float]
|
Initial quaternion [x, y, z, w]. |
required |
rotate_quat
|
list[float]
|
Rotation quaternion [x, y, z, w]. |
required |
Returns:
| Type | Description |
|---|---|
list[float]
|
list[float]: Resulting quaternion [x, y, z, w]. |
Source code in embodied_gen/utils/geometry.py
271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 | |
with_seed
with_seed(seed_attr_name: str = 'seed')
Decorator to temporarily set the random seed for reproducibility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed_attr_name
|
str
|
Name of the seed argument. |
'seed'
|
Returns:
| Name | Type | Description |
|---|---|---|
function |
Decorator function. |
Source code in embodied_gen/utils/geometry.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
embodied_gen.utils.gaussian
export_splats
export_splats(means: Tensor, scales: Tensor, quats: Tensor, opacities: Tensor, sh0: Tensor, shN: Tensor, format: Literal['ply'] = 'ply', save_to: Optional[str] = None) -> bytes
Export a Gaussian Splats model to bytes in PLY file format.
Source code in embodied_gen/utils/gaussian.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
restore_scene_scale_and_position
restore_scene_scale_and_position(real_height: float, mesh_path: str, gs_path: str) -> None
Scales a mesh and corresponding GS model to match a given real-world height.
Uses the 1st and 99th percentile of mesh Z-axis to estimate height, applies scaling and vertical alignment, and updates both the mesh and GS model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
real_height
|
float
|
Target real-world height among Z axis. |
required |
mesh_path
|
str
|
Path to the input mesh file. |
required |
gs_path
|
str
|
Path to the Gaussian Splatting model file. |
required |
Source code in embodied_gen/utils/gaussian.py
300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 | |
embodied_gen.utils.gpt_clients
GPTclient
GPTclient(endpoint: str, api_key: str, model_name: str = 'yfb-gpt-4o', api_version: str = None, check_connection: bool = True, verbose: bool = False)
A client to interact with GPT models via OpenAI or Azure API.
Supports text and image prompts, connection checking, and configurable parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endpoint
|
str
|
API endpoint URL. |
required |
api_key
|
str
|
API key for authentication. |
required |
model_name
|
str
|
Model name to use. |
'yfb-gpt-4o'
|
api_version
|
str
|
API version (for Azure). |
None
|
check_connection
|
bool
|
Whether to check API connection. |
True
|
verbose
|
bool
|
Enable verbose logging. |
False
|
Example
export ENDPOINT="https://yfb-openai-sweden.openai.azure.com"
export API_KEY="xxxxxx"
export API_VERSION="2025-03-01-preview"
export MODEL_NAME="yfb-gpt-4o-sweden"
from embodied_gen.utils.gpt_clients import GPT_CLIENT
response = GPT_CLIENT.query("Describe the physics of a falling apple.")
response = GPT_CLIENT.query(
text_prompt="Describe the content in each image."
image_base64=["path/to/image1.png", "path/to/image2.jpg"],
)
Source code in embodied_gen/utils/gpt_clients.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | |
check_connection
check_connection() -> None
Checks whether the GPT API connection is working.
Raises:
| Type | Description |
|---|---|
ConnectionError
|
If connection fails. |
Source code in embodied_gen/utils/gpt_clients.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | |
completion_with_backoff
completion_with_backoff(**kwargs)
Performs a chat completion request with retry/backoff.
Source code in embodied_gen/utils/gpt_clients.py
108 109 110 111 112 113 114 | |
query
query(text_prompt: str, image_base64: Optional[list[str | Image]] = None, system_role: Optional[str] = None, params: Optional[dict] = None) -> Optional[str]
Queries the GPT model with text and optional image prompts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text_prompt
|
str
|
Main text input. |
required |
image_base64
|
Optional[list[str | Image]]
|
List of image base64 strings, file paths, or PIL Images. |
None
|
system_role
|
Optional[str]
|
System-level instructions. |
None
|
params
|
Optional[dict]
|
Additional GPT parameters. |
None
|
Returns:
| Type | Description |
|---|---|
Optional[str]
|
Optional[str]: Model response content, or None if error. |
Source code in embodied_gen/utils/gpt_clients.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | |
embodied_gen.utils.process_media
SceneTreeVisualizer
SceneTreeVisualizer(layout_info: LayoutInfo)
Visualizes a scene tree layout using networkx and matplotlib.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layout_info
|
LayoutInfo
|
Layout information for the scene. |
required |
Example
from embodied_gen.utils.process_media import SceneTreeVisualizer
visualizer = SceneTreeVisualizer(layout_info)
visualizer.render(save_path="tree.png")
Source code in embodied_gen/utils/process_media.py
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 | |
render
render(save_path: str, figsize=(8, 6), dpi=300, title: str = 'Scene 3D Hierarchy Tree')
Renders the scene tree and saves to file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_path
|
str
|
Path to save the rendered image. |
required |
figsize
|
tuple
|
Figure size. |
(8, 6)
|
dpi
|
int
|
Image DPI. |
300
|
title
|
str
|
Plot image title. |
'Scene 3D Hierarchy Tree'
|
Source code in embodied_gen/utils/process_media.py
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 | |
alpha_blend_rgba
alpha_blend_rgba(fg_image: Union[str, Image, ndarray], bg_image: Union[str, Image, ndarray]) -> Image.Image
Alpha blends a foreground RGBA image over a background RGBA image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fg_image
|
Union[str, Image, ndarray]
|
Foreground image (str, PIL Image, or ndarray). |
required |
bg_image
|
Union[str, Image, ndarray]
|
Background image (str, PIL Image, or ndarray). |
required |
Returns:
| Type | Description |
|---|---|
Image
|
Image.Image: Alpha-blended RGBA image. |
Example
from embodied_gen.utils.process_media import alpha_blend_rgba
result = alpha_blend_rgba("fg.png", "bg.png")
result.save("blended.png")
Source code in embodied_gen/utils/process_media.py
512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 | |
check_object_edge_truncated
check_object_edge_truncated(mask: ndarray, edge_threshold: int = 5) -> bool
Checks if a binary object mask is truncated at the image edges.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mask
|
ndarray
|
2D binary mask. |
required |
edge_threshold
|
int
|
Edge pixel threshold. |
5
|
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if object is fully enclosed, False if truncated. |
Source code in embodied_gen/utils/process_media.py
553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 | |
combine_images_to_grid
combine_images_to_grid(images: list[str | Image], cat_row_col: tuple[int, int] = None, target_wh: tuple[int, int] = (512, 512), image_mode: str = 'RGB') -> list[Image.Image]
Combines multiple images into a grid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
images
|
list[str | Image]
|
List of image paths or PIL Images. |
required |
cat_row_col
|
tuple[int, int]
|
Grid rows and columns. |
None
|
target_wh
|
tuple[int, int]
|
Target image size. |
(512, 512)
|
image_mode
|
str
|
Image mode. |
'RGB'
|
Returns:
| Type | Description |
|---|---|
list[Image]
|
list[Image.Image]: List containing the grid image. |
Example
from embodied_gen.utils.process_media import combine_images_to_grid
grid = combine_images_to_grid(["img1.png", "img2.png"])
grid[0].save("grid.png")
Source code in embodied_gen/utils/process_media.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 | |
filter_image_small_connected_components
filter_image_small_connected_components(image: Union[Image, ndarray], area_ratio: float = 10, connectivity: int = 8) -> np.ndarray
Removes small connected components from the alpha channel of an image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Union[Image, ndarray]
|
Input image. |
required |
area_ratio
|
float
|
Minimum area ratio. |
10
|
connectivity
|
int
|
Connectivity for labeling. |
8
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Image with filtered alpha channel. |
Source code in embodied_gen/utils/process_media.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | |
filter_small_connected_components
filter_small_connected_components(mask: Union[Image, ndarray], area_ratio: float, connectivity: int = 8) -> np.ndarray
Removes small connected components from a binary mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mask
|
Union[Image, ndarray]
|
Input mask. |
required |
area_ratio
|
float
|
Minimum area ratio for components. |
required |
connectivity
|
int
|
Connectivity for labeling. |
8
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Mask with small components removed. |
Source code in embodied_gen/utils/process_media.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | |
is_image_file
is_image_file(filename: str) -> bool
Checks if a filename is an image file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Filename to check. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if image file, False otherwise. |
Source code in embodied_gen/utils/process_media.py
479 480 481 482 483 484 485 486 487 488 489 490 | |
load_scene_dict
load_scene_dict(file_path: str) -> dict
Loads a scene description dictionary from a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
Path to the scene description file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Mapping from scene ID to description. |
Source code in embodied_gen/utils/process_media.py
458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 | |
merge_images_video
merge_images_video(color_images, normal_images, output_path) -> None
Merges color and normal images into a video.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
color_images
|
list[ndarray]
|
List of color images. |
required |
normal_images
|
list[ndarray]
|
List of normal images. |
required |
output_path
|
str
|
Path to save the output video. |
required |
Source code in embodied_gen/utils/process_media.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | |
merge_video_video
merge_video_video(video_path1: str, video_path2: str, output_path: str) -> None
Merges two videos by combining their left and right halves.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video_path1
|
str
|
Path to first video. |
required |
video_path2
|
str
|
Path to second video. |
required |
output_path
|
str
|
Path to save the merged video. |
required |
Source code in embodied_gen/utils/process_media.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
parse_text_prompts
parse_text_prompts(prompts: list[str]) -> list[str]
Parses text prompts from a list or file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompts
|
list[str]
|
List of prompts or a file path. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: List of parsed prompts. |
Source code in embodied_gen/utils/process_media.py
493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 | |
render_asset3d
render_asset3d(mesh_path: str, output_root: str, distance: float = 5.0, num_images: int = 1, elevation: list[float] = (0.0,), pbr_light_factor: float = 1.2, return_key: str = 'image_color/*', output_subdir: str = 'renders', gen_color_mp4: bool = False, gen_viewnormal_mp4: bool = False, gen_glonormal_mp4: bool = False, no_index_file: bool = False, with_mtl: bool = True) -> list[str]
Renders a 3D mesh asset and returns output image paths.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mesh_path
|
str
|
Path to the mesh file. |
required |
output_root
|
str
|
Directory to save outputs. |
required |
distance
|
float
|
Camera distance. |
5.0
|
num_images
|
int
|
Number of views to render. |
1
|
elevation
|
list[float]
|
Camera elevation angles. |
(0.0,)
|
pbr_light_factor
|
float
|
PBR lighting factor. |
1.2
|
return_key
|
str
|
Glob pattern for output images. |
'image_color/*'
|
output_subdir
|
str
|
Subdirectory for outputs. |
'renders'
|
gen_color_mp4
|
bool
|
Generate color MP4 video. |
False
|
gen_viewnormal_mp4
|
bool
|
Generate view normal MP4. |
False
|
gen_glonormal_mp4
|
bool
|
Generate global normal MP4. |
False
|
no_index_file
|
bool
|
Skip index file saving. |
False
|
with_mtl
|
bool
|
Use mesh material. |
True
|
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: List of output image file paths. |
Example
from embodied_gen.utils.process_media import render_asset3d
image_paths = render_asset3d(
mesh_path="path_to_mesh.obj",
output_root="path_to_save_dir",
num_images=6,
elevation=(30, -30),
output_subdir="renders",
no_index_file=True,
)
Source code in embodied_gen/utils/process_media.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
vcat_pil_images
vcat_pil_images(images: list[Image], image_mode: str = 'RGB') -> Image.Image
Vertically concatenates a list of PIL images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
images
|
list[Image]
|
List of images. |
required |
image_mode
|
str
|
Image mode. |
'RGB'
|
Returns:
| Type | Description |
|---|---|
Image
|
Image.Image: Vertically concatenated image. |
Example
from embodied_gen.utils.process_media import vcat_pil_images
img = vcat_pil_images([Image.open("a.png"), Image.open("b.png")])
img.save("vcat.png")
Source code in embodied_gen/utils/process_media.py
573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 | |
embodied_gen.utils.simulation
FrankaPandaGrasper
FrankaPandaGrasper(agent: BaseAgent, control_freq: float, joint_vel_limits: float = 2.0, joint_acc_limits: float = 1.0, finger_length: float = 0.025)
Bases: object
Provides grasp planning and control for Franka Panda robot.
Attributes:
| Name | Type | Description |
|---|---|---|
agent |
BaseAgent
|
The robot agent. |
robot |
The robot instance. |
|
control_freq |
float
|
Control frequency. |
control_timestep |
float
|
Control timestep. |
joint_vel_limits |
float
|
Joint velocity limits. |
joint_acc_limits |
float
|
Joint acceleration limits. |
finger_length |
float
|
Length of gripper fingers. |
planners |
Motion planners for each environment. |
Initialize the grasper.
Source code in embodied_gen/utils/simulation.py
595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 | |
compute_grasp_action
compute_grasp_action(actor: Entity, reach_target_only: bool = True, offset: tuple[float, float, float] = [0, 0, -0.05], env_idx: int = 0) -> np.ndarray
Compute grasp actions for a target actor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
actor
|
Entity
|
Target actor to grasp. |
required |
reach_target_only
|
bool
|
Only reach the target pose if True. |
True
|
offset
|
tuple[float, float, float]
|
Offset for reach pose. |
[0, 0, -0.05]
|
env_idx
|
int
|
Environment index. |
0
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Array of grasp actions. |
Source code in embodied_gen/utils/simulation.py
710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 | |
control_gripper
control_gripper(gripper_state: Literal[-1, 1], n_step: int = 10) -> np.ndarray
Generate gripper control actions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gripper_state
|
Literal[-1, 1]
|
Desired gripper state. |
required |
n_step
|
int
|
Number of steps. |
10
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Array of gripper actions. |
Source code in embodied_gen/utils/simulation.py
634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 | |
move_to_pose
move_to_pose(pose: Pose, control_timestep: float, gripper_state: Literal[-1, 1], use_point_cloud: bool = False, n_max_step: int = 100, action_key: str = 'position', env_idx: int = 0) -> np.ndarray
Plan and execute motion to a target pose.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pose
|
Pose
|
Target pose. |
required |
control_timestep
|
float
|
Control timestep. |
required |
gripper_state
|
Literal[-1, 1]
|
Desired gripper state. |
required |
use_point_cloud
|
bool
|
Use point cloud for planning. |
False
|
n_max_step
|
int
|
Max number of steps. |
100
|
action_key
|
str
|
Key for action in result. |
'position'
|
env_idx
|
int
|
Environment index. |
0
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Array of actions to reach the pose. |
Source code in embodied_gen/utils/simulation.py
656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 | |
SapienSceneManager
SapienSceneManager(sim_freq: int, ray_tracing: bool, device: str = 'cuda')
Manages SAPIEN simulation scenes, cameras, and rendering.
This class provides utilities for setting up scenes, adding cameras, stepping simulation, and rendering images.
Attributes:
| Name | Type | Description |
|---|---|---|
sim_freq |
int
|
Simulation frequency. |
ray_tracing |
bool
|
Whether to use ray tracing. |
device |
str
|
Device for simulation. |
renderer |
SapienRenderer
|
SAPIEN renderer. |
scene |
Scene
|
Simulation scene. |
cameras |
list
|
List of camera components. |
actors |
dict
|
Mapping of actor names to entities. |
Example see embodied_gen/scripts/simulate_sapien.py.
Initialize the scene manager.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sim_freq
|
int
|
Simulation frequency. |
required |
ray_tracing
|
bool
|
Enable ray tracing. |
required |
device
|
str
|
Device for simulation. |
'cuda'
|
Source code in embodied_gen/utils/simulation.py
392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 | |
create_camera
create_camera(cam_name: str, pose: Pose, image_hw: tuple[int, int], fovy_deg: float) -> sapien.render.RenderCameraComponent
Create a camera in the scene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cam_name
|
str
|
Camera name. |
required |
pose
|
Pose
|
Camera pose. |
required |
image_hw
|
tuple[int, int]
|
Image resolution (height, width). |
required |
fovy_deg
|
float
|
Field of view in degrees. |
required |
Returns:
| Type | Description |
|---|---|
RenderCameraComponent
|
sapien.render.RenderCameraComponent: The created camera. |
Source code in embodied_gen/utils/simulation.py
482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 | |
initialize_circular_cameras
initialize_circular_cameras(num_cameras: int, radius: float, height: float, target_pt: list[float], image_hw: tuple[int, int], fovy_deg: float) -> list[sapien.render.RenderCameraComponent]
Initialize multiple cameras arranged in a circle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_cameras
|
int
|
Number of cameras. |
required |
radius
|
float
|
Circle radius. |
required |
height
|
float
|
Camera height. |
required |
target_pt
|
list[float]
|
Target point to look at. |
required |
image_hw
|
tuple[int, int]
|
Image resolution. |
required |
fovy_deg
|
float
|
Field of view in degrees. |
required |
Returns:
| Type | Description |
|---|---|
list[RenderCameraComponent]
|
list[sapien.render.RenderCameraComponent]: List of cameras. |
Source code in embodied_gen/utils/simulation.py
516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 | |
step_action
step_action(agent: BaseAgent, action: Tensor, cameras: list[RenderCameraComponent], render_keys: list[str], sim_steps_per_control: int = 1) -> dict
Step the simulation and render images from cameras.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent
|
BaseAgent
|
The robot agent. |
required |
action
|
Tensor
|
Action to apply. |
required |
cameras
|
list
|
List of camera components. |
required |
render_keys
|
list[str]
|
Types of images to render. |
required |
sim_steps_per_control
|
int
|
Simulation steps per control. |
1
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Dictionary of rendered frames per camera. |
Source code in embodied_gen/utils/simulation.py
449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 | |
load_actor_from_urdf
load_actor_from_urdf(scene: Scene | ManiSkillScene, file_path: str, pose: Pose | None = None, env_idx: int = None, use_static: bool = False, update_mass: bool = False, scale: float | ndarray = 1.0) -> sapien.pysapien.Entity
Load an sapien actor from a URDF file and add it to the scene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scene
|
Scene | ManiSkillScene
|
The simulation scene. |
required |
file_path
|
str
|
Path to the URDF file. |
required |
pose
|
Pose | None
|
Initial pose of the actor. |
None
|
env_idx
|
int
|
Environment index for multi-env setup. |
None
|
use_static
|
bool
|
Whether the actor is static. |
False
|
update_mass
|
bool
|
Whether to update the actor's mass from URDF. |
False
|
scale
|
float | ndarray
|
Scale factor for the actor. |
1.0
|
Returns:
| Type | Description |
|---|---|
Entity
|
sapien.pysapien.Entity: The created actor entity. |
Source code in embodied_gen/utils/simulation.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
load_assets_from_layout_file
load_assets_from_layout_file(scene: ManiSkillScene | Scene, layout: str, z_offset: float = 0.0, init_quat: list[float] = [0, 0, 0, 1], env_idx: int = None) -> dict[str, sapien.pysapien.Entity]
Load assets from an EmbodiedGen layout file and create sapien actors in the scene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scene
|
ManiSkillScene | Scene
|
The sapien simulation scene. |
required |
layout
|
str
|
Path to the embodiedgen layout file. |
required |
z_offset
|
float
|
Z offset for non-context objects. |
0.0
|
init_quat
|
list[float]
|
Initial quaternion for orientation. |
[0, 0, 0, 1]
|
env_idx
|
int
|
Environment index. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Entity]
|
dict[str, sapien.pysapien.Entity]: Mapping from object names to actor entities. |
Source code in embodied_gen/utils/simulation.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | |
load_mani_skill_robot
load_mani_skill_robot(scene: Scene | ManiSkillScene, layout: LayoutInfo | str, control_freq: int = 20, robot_init_qpos_noise: float = 0.0, control_mode: str = 'pd_joint_pos', backend_str: tuple[str, str] = ('cpu', 'gpu')) -> BaseAgent
Load a ManiSkill robot agent into the scene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scene
|
Scene | ManiSkillScene
|
The simulation scene. |
required |
layout
|
LayoutInfo | str
|
Layout info or path to layout file. |
required |
control_freq
|
int
|
Control frequency. |
20
|
robot_init_qpos_noise
|
float
|
Noise for initial joint positions. |
0.0
|
control_mode
|
str
|
Robot control mode. |
'pd_joint_pos'
|
backend_str
|
tuple[str, str]
|
Simulation/render backend. |
('cpu', 'gpu')
|
Returns:
| Name | Type | Description |
|---|---|---|
BaseAgent |
BaseAgent
|
The loaded robot agent. |
Source code in embodied_gen/utils/simulation.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 | |
render_images
render_images(camera: RenderCameraComponent, render_keys: list[Literal['Color', 'Segmentation', 'Normal', 'Mask', 'Depth', 'Foreground']] = None) -> dict[str, Image.Image]
Render images from a given SAPIEN camera.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
camera
|
RenderCameraComponent
|
Camera to render from. |
required |
render_keys
|
list[str]
|
Types of images to render. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Image]
|
dict[str, Image.Image]: Dictionary of rendered images. |
Source code in embodied_gen/utils/simulation.py
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 | |