Learning Multi-Stage Pick-and-Place

with a Legged Mobile Manipulator

Demo Introduction

  • Demo videos below were captured when the same SLIM policy was deployed in diverse real-world scenes.
  • During the task execution, the robot was fully autonomous without any human teleoperation.
  • All videos were captured and are played back in real-time without playback speedup.
  • An annotation in the form of "red cube → blue basket" is used to highlight the target object and target basket in the task, without repeating the full language instruction like "drop the red cube into the blue basket".

Indoor: Kitchen

yellow cube → blue basket

blue cube → red basket


Indoor: Lobby


red cube → blue basket

yellow cube → red basket


Indoor: Carpet Room


blue cube → blue basket

green cube → red basket


Outdoor: Lawn


red cube → blue basket

red cube → blue basket


Outdoor: Mulch


red cube → red basket

yellow cube → blue basket


Outdoor: Concrete


yellow cube → red basket

blue cube → red basket


Generalizations

Random Spatial Layout with Visual Distractors

blue cube → red basket

yellow cube → red basket


Emergent Behaviors


Novel Object

behavior description: the policy can grasp an object with a novel shape that is out of the training distribution.

Re-Grasping

behavior description: the policy can re-grasp the target cube when a human interrupts the task progress by removing the cube from the gripper and tossing it to the ground.

Task Chaining

behavior description: the policy can execute the full task three times consecutively without stopping. Each time a target cube (red cube) is dropped into the basket, another target cube is tossed onto the ground. The policy will execute another full task starting from that state.


BibTeX

              @article{slim,
              title   = {Learning Multi-Stage Pick-and-Place with a Legged Mobile Manipulator},
              author  = {Haichao Zhang and Haonan Yu and Le Zhao and Andrew Choi and Qinxun Bai and Yiqing Yang and Wei Xu},
              journal = {IEEE Robotics and Automation Letters (RA-L)},
              year    = {2025}
              }
            
              @article{slim_tech_report,
              title   = {{SLIM}: Sim-to-Real Legged Instructive Manipulation via Long-Horizon Visuomotor Learning},
              author  = {Haichao Zhang and Haonan Yu and Le Zhao and Andrew Choi and Qinxun Bai and Yiqing Yang and Wei Xu},
              eprint  = {2501.09905},
              url     = {https://arxiv.org/abs/2501.09905},
              year    = {2025}
              }