22 research outputs found
SOCIALGYM: A Framework for Benchmarking Social Robot Navigation
Robots moving safely and in a socially compliant manner in dynamic human
environments is an essential benchmark for long-term robot autonomy. However,
it is not feasible to learn and benchmark social navigation behaviors entirely
in the real world, as learning is data-intensive, and it is challenging to make
safety guarantees during training. Therefore, simulation-based benchmarks that
provide abstractions for social navigation are required. A framework for these
benchmarks would need to support a wide variety of learning approaches, be
extensible to the broad range of social navigation scenarios, and abstract away
the perception problem to focus on social navigation explicitly. While there
have been many proposed solutions, including high fidelity 3D simulators and
grid world approximations, no existing solution satisfies all of the
aforementioned properties for learning and evaluating social navigation
behaviors. In this work, we propose SOCIALGYM, a lightweight 2D simulation
environment for robot social navigation designed with extensibility in mind,
and a benchmark scenario built on SOCIALGYM. Further, we present benchmark
results that compare and contrast human-engineered and model-based learning
approaches to a suite of off-the-shelf Learning from Demonstration (LfD) and
Reinforcement Learning (RL) approaches applied to social robot navigation.
These results demonstrate the data efficiency, task performance, social
compliance, and environment transfer capabilities for each of the policies
evaluated to provide a solid grounding for future social navigation research.Comment: Published in IROS202
SOCIALGYM 2.0: Simulator for Multi-Agent Social Robot Navigation in Shared Human Spaces
We present SocialGym 2, a multi-agent navigation simulator for social robot
research. Our simulator models multiple autonomous agents, replicating
real-world dynamics in complex environments, including doorways, hallways,
intersections, and roundabouts. Unlike traditional simulators that concentrate
on single robots with basic kinematic constraints in open spaces, SocialGym 2
employs multi-agent reinforcement learning (MARL) to develop optimal navigation
policies for multiple robots with diverse, dynamic constraints in complex
environments. Built on the PettingZoo MARL library and Stable Baselines3 API,
SocialGym 2 offers an accessible python interface that integrates with a
navigation stack through ROS messaging. SocialGym 2 can be easily installed and
is packaged in a docker container, and it provides the capability to swap and
evaluate different MARL algorithms, as well as customize observation and reward
functions. We also provide scripts to allow users to create their own
environments and have conducted benchmarks using various social navigation
algorithms, reporting a broad range of social navigation metrics. Projected
hosted at: https://amrl.cs.utexas.edu/social_gym/index.htmlComment: Submitted to RSS 202
Programmatic Imitation Learning from Unlabeled and Noisy Demonstrations
Imitation Learning (IL) is a promising paradigm for teaching robots to
perform novel tasks using demonstrations. Most existing approaches for IL
utilize neural networks (NN), however, these methods suffer from several
well-known limitations: they 1) require large amounts of training data, 2) are
hard to interpret, and 3) are hard to repair and adapt. There is an emerging
interest in programmatic imitation learning (PIL), which offers significant
promise in addressing the above limitations. In PIL, the learned policy is
represented in a programming language, making it amenable to interpretation and
repair. However, state-of-the-art PIL algorithms assume access to action labels
and struggle to learn from noisy real-world demonstrations. In this paper, we
propose PLUNDER, a novel PIL algorithm that integrates a probabilistic program
synthesizer in an iterative Expectation-Maximization (EM) framework to address
these shortcomings. Unlike existing PIL approaches, PLUNDER synthesizes
probabilistic programmatic policies that are particularly well-suited for
modeling the uncertainties inherent in real-world demonstrations. Our approach
leverages an EM loop to simultaneously infer the missing action labels and the
most likely probabilistic policy. We benchmark PLUNDER against several
established IL techniques, and demonstrate its superiority across five
challenging imitation learning tasks under noise. PLUNDER policies achieve 95%
accuracy in matching the given demonstrations, outperforming the next best
baseline by 19%. Additionally, policies generated by PLUNDER successfully
complete the tasks 17% more frequently than the nearest baseline
Recommended from our members
Leveraging program synthesis for robust long-term robot autonomy via interactive learning and adaptation
For autonomous robots to become as pervasive in uncontrolled human environments and our everyday lives as they are on campuses and in warehouses, they need to be deployable by end-users for various tasks. End-user deployment of autonomous robots over the long term requires robust behaviors that can leverage fundamental robot capabilities to achieve diverse goals while subject to various domains and user preferences. Achieving this goal requires a system for designing and adapting behaviors that is intuitive, data-efficient, easy to integrate, and can handle changes in user-imposed requirements over long deployments. State-of-the-art approaches to designing robot behaviors broadly fall into three categories: reinforcement learning, inverse reinforcement learning, and learning from demonstration. State-of-the-art approaches for these techniques widely leverage deep neural networks (DNNs) as function approximators to represent either the complete behavior, an optimal reward function, or both a value function and the behavior in Actor-Critic approaches. DNNs are a powerful tool for function approximation that have been the catalyst for significant successes across a wide range of learning applications. While DNN-based approaches are broadly applicable, they suffer from three key weaknesses when used for end-user robot behavior design and adaptation: 1) DNNs are black-box behavior representations and thus are opaque to the user and difficult to understand or verify, 2) learning with DNNs is extremely data-intensive, often requiring that data be collected in simulation, and 3) DNN behaviors are difficult to adapt and sensitive to changing domains or user- preferences, such as when transferring from simulation to the real world. In this thesis, we present approaches to leverage program synthesis as an alternative function approximator for learning from demonstration to approximate behaviors and reward functions, respectively. Program synthesis as a function approximator addresses some limitations of DNN-based approaches by yielding human-readable behavior representations that are amenable to program repair and parameter optimization for adaptation, and that can leverage the well-structured space of programs to learn behaviors in a data-efficient manner. However, due to two primary factors, existing state-of-the-art synthesis approaches are insufficient to learn general robot programs. First, these approaches are not designed to handle non-linear real arithmetic, vector operations, or dimensioned quantities, all commonly found in robot programs. Second, synthesis techniques are primarily limited by their ability to scale with the search space of potential programs, such that synthesis of many reasonably complex behaviors is intractable for existing approaches. To address the goal of end-user-guided robot behavior learning and adaption, We present Physics Informed Programs Synthesis (PIPS) as part of a learning from demonstration and adaptation approach to lifelong robot learning. Towards this goal, this thesis presents the following contributions: 1) An algorithm for PIPS that addresses limitations of program synthesis for robotics by reasoning about physical quantities, 2) algorithms for LfD leveraging PIPS to learn robot behaviors as human-readable programs, 3) an approach to guiding lifelong robot learning by leveraging the structure of programmatic policies and demonstrations, 4) program repair and synthesis techniques for adapting these learned policies from iterative user guidance, and finally, 5) extensive evaluation results in the social robot navigation domain across simulated and real-world deployments that compare PIPS-based learning to DNN-based and traditional approaches.Computer Science
STEADY: Simultaneous State Estimation and Dynamics Learning from Indirect Observations
Accurate kinodynamic models play a crucial role in many robotics applications
such as off-road navigation and high-speed driving. Many state-of-the-art
approaches in learning stochastic kinodynamic models, however, require precise
measurements of robot states as labeled input/output examples, which can be
hard to obtain in outdoor settings due to limited sensor capabilities and the
absence of ground truth. In this work, we propose a new technique for learning
neural stochastic kinodynamic models from noisy and indirect observations by
performing simultaneous state estimation and dynamics learning. The proposed
technique iteratively improves the kinodynamic model in an
expectation-maximization loop, where the E Step samples posterior state
trajectories using particle filtering, and the M Step updates the dynamics to
be more consistent with the sampled trajectories via stochastic gradient
ascent. We evaluate our approach on both simulation and real-world benchmarks
and compare it with several baseline techniques. Our approach not only achieves
significantly higher accuracy but is also more robust to observation noise,
thereby showing promise for boosting the performance of many other robotics
applications.Comment: Accepted for publication in the Proceedings of IROS 202