In this paper, we propose a method to create visuomotor mobile manipulation
solutions for long-horizon activities. We propose to leverage the recent
advances in simulation to train visual solutions for mobile manipulation. While
previous works have shown success applying this procedure to autonomous visual
navigation and stationary manipulation, applying it to long-horizon visuomotor
mobile manipulation is still an open challenge that demands both perceptual and
compositional generalization of multiple skills. In this work, we develop
Mobile-EMBER, or M-EMBER, a factorized method that decomposes a long-horizon
mobile manipulation activity into a repertoire of primitive visual skills,
reinforcement-learns each skill, and composes these skills to a long-horizon
mobile manipulation activity. On a mobile manipulation robot, we find that
M-EMBER completes a long-horizon mobile manipulation activity,
cleaning_kitchen, achieving a 53% success rate. This requires successfully
planning and executing five factorized, learned visual skills