• Author(s): Shagun Uppal, Ananye Agarwal, Haoyu Xiong, Kenneth Shaw, Deepak Pathak

This paper addresses the enduring challenge of mobile manipulation, which remains complex despite advancements in manipulation and locomotion. Mobile manipulation systems must perform a variety of long-term tasks in unstructured and dynamic environments, presenting challenges such as coordinating the base and arm, relying on onboard perception, and integrating all components simultaneously. Traditional approaches use separate modular skills for mobility and manipulation, leading to issues like compounded errors, decision-making delays, and lack of whole-body coordination.

The paper proposes a reactive mobile manipulation framework that employs an active visual system for real-time environmental perception and reaction. This system enhances the mobile manipulator’s ability to coordinate movement and perception, akin to human whole-body and hand-eye coordination. The framework enables the manipulator to move to improve its visual perception and to use visual data to inform its movements. This approach allows the agent to navigate complex, cluttered environments effectively using ego-vision, without needing to create detailed environment maps.