Efficiently Learning Human Preferences for Robot Autonomy

Abstract

Human-robot teams are invaluable for mapping unknown environments, exploring difficult-to-reach areas, and manipulating inaccessible equipment. However, guiding autonomous robots requires dealing with these dynamic domains while synthesizing a significant amount of data and balancing competing objectives. Current mission planning methods often involve manually specifying low-level parameters of the mission, such as exact waypoints or control inputs. These methods cannot perfectly cope with the changing surroundings and limited communications that come with operating in these complex conditions. To address this and reduce the burden on human operators, the field has trended towards ever-increasing levels of autonomy. Providing this long-term autonomy requires more usable, robust collaborative mission planning solutions that leverage the strengths of both the robot and the human operator. In this thesis, we propose two novel methods for improving the collaboration of human-robot teams by enabling the robot to learn an operator's preferences for mission planning. These techniques provide the robot with a rich representation of the human's goals while utilizing familiar techniques to speed learning. The first method is trained by making small-scale, iterative improvements to candidate mission plans generated by the robot, similar to the small improvements an operator would make while planning an actual mission. Using a novel coactive learning algorithm, the method learns the operator's preferences from the feature differences between the original and improved mission plans while remaining robust to errors and noise in the operator's corrections. The second proposed method simplifies the queries by asking survey-style rating and ranking questions about candidate plans. These queries are generated by a Gaussian process (GP) active learner that uses the responses to learn the most preferred region of the mission preference space. The ranking query responses provide the GP with general relational information about several points in the preference space, while the rating query responses provide a specific preference about a single point. A custom probit allows the GP to incorporate the different strengths of each query type into a single preference model. Tests in simulated lake monitoring missions show that these methods can efficiently and accurately learn an operator’s preferences. Additionally, a field trial in which an EcoMapper autonomous underwater vehicle monitors the ecology of a lake validates the use of the coactive learning method. These results demonstrate that these techniques can enable a robot to accurately learn a human operator's preferences, then autonomously plan and perform missions that apply those preferences without relying on regular intervention by the operator

    Similar works