4,738 research outputs found
Assessing Transferability from Simulation to Reality for Reinforcement Learning
Learning robot control policies from physics simulations is of great interest
to the robotics community as it may render the learning process faster,
cheaper, and safer by alleviating the need for expensive real-world
experiments. However, the direct transfer of learned behavior from simulation
to reality is a major challenge. Optimizing a policy on a slightly faulty
simulator can easily lead to the maximization of the `Simulation Optimization
Bias` (SOB). In this case, the optimizer exploits modeling errors of the
simulator such that the resulting behavior can potentially damage the robot. We
tackle this challenge by applying domain randomization, i.e., randomizing the
parameters of the physics simulations during learning. We propose an algorithm
called Simulation-based Policy Optimization with Transferability Assessment
(SPOTA) which uses an estimator of the SOB to formulate a stopping criterion
for training. The introduced estimator quantifies the over-fitting to the set
of domains experienced while training. Our experimental results on two
different second order nonlinear systems show that the new simulation-based
policy search algorithm is able to learn a control policy exclusively from a
randomized simulator, which can be applied directly to real systems without any
additional training
Teacher-Student Reinforcement Learning for Mapless Navigation using a Planetary Space Rover
We address the challenge of enhancing navigation autonomy for planetary space
rovers using reinforcement learning (RL). The ambition of future space missions
necessitates advanced autonomous navigation capabilities for rovers to meet
mission objectives. RL's potential in robotic autonomy is evident, but its
reliance on simulations poses a challenge. Transferring policies to real-world
scenarios often encounters the "reality gap", disrupting the transition from
virtual to physical environments. The reality gap is exacerbated in the context
of mapless navigation on Mars and Moon-like terrains, where unpredictable
terrains and environmental factors play a significant role. Effective
navigation requires a method attuned to these complexities and real-world data
noise. We introduce a novel two-stage RL approach using offline noisy data. Our
approach employs a teacher-student policy learning paradigm, inspired by the
"learning by cheating" method. The teacher policy is trained in simulation.
Subsequently, the student policy is trained on noisy data, aiming to mimic the
teacher's behaviors while being more robust to real-world uncertainties. Our
policies are transferred to a custom-designed rover for real-world testing.
Comparative analyses between the teacher and student policies reveal that our
approach offers improved behavioral performance, heightened noise resilience,
and more effective sim-to-real transfer
AutoVRL: A High Fidelity Autonomous Ground Vehicle Simulator for Sim-to-Real Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) enables cognitive Autonomous Ground Vehicle
(AGV) navigation utilizing raw sensor data without a-priori maps or GPS, which
is a necessity in hazardous, information poor environments such as regions
where natural disasters occur, and extraterrestrial planets. The substantial
training time required to learn an optimal DRL policy, which can be days or
weeks for complex tasks, is a major hurdle to real-world implementation in AGV
applications. Training entails repeated collisions with the surrounding
environment over an extended time period, dependent on the complexity of the
task, to reinforce positive exploratory, application specific behavior that is
expensive, and time consuming in the real-world. Effectively bridging the
simulation to real-world gap is a requisite for successful implementation of
DRL in complex AGV applications, enabling learning of cost-effective policies.
We present AutoVRL, an open-source high fidelity simulator built upon the
Bullet physics engine utilizing OpenAI Gym and Stable Baselines3 in PyTorch to
train AGV DRL agents for sim-to-real policy transfer. AutoVRL is equipped with
sensor implementations of GPS, IMU, LiDAR and camera, actuators for AGV
control, and realistic environments, with extensibility for new environments
and AGV models. The simulator provides access to state-of-the-art DRL
algorithms, utilizing a python interface for simple algorithm and environment
customization, and simulation execution.Comment: 2023 the authors. This work has been accepted to IFAC
for publication under a Creative Commons License CC-BY-NC-N
Sim-to-real transfer and reality gap modeling in model predictive control for autonomous driving
The main challenge for the adoption of autonomous driving is to ensure an adequate level of safety. Considering the almost infinite variability of possible scenarios that autonomous vehicles would have to face, the use of autonomous driving simulators is becoming of utmost importance. Simulation suites allow the used of automated validation techniques in a wide variety of scenarios, and enable the development of closed-loop validation methods, such as machine learning and reinforcement learning approaches. However, simulation tools suffer from a standing flaw in that there is a noticeable gap between the simulation conditions and real-world scenarios. Although the use of simulators powers most of the research around autonomous driving, and is generally used within all domains it is divided into, there is an inherent source of error given the stochastic nature of activities performed in real world, which are unreplicable in computer environments. This paper proposes a new approach to assess the real-to-sim gap for path tracking systems. The aim is to narrow down the sources of error between simulation results and real-world conditions, and to evaluate the performance of the simulation suite in the design process by employing the information extracted from gap analysis, which adds a new dimension of development against other approaches for autonomous driving. A real-time model predictive controller (MPC) based on adaptive potential fields was developed and validated using the CARLA simulator. Both the path planning and vehicle control systems where tested in real traffic conditions. The error between the simulator and the real data acquisition was evaluated using the Pearson correlation coefficient (PCC) and the max normalized cross-correlation (MNCC). The controller was further evaluated on a process of sim-to-real transfer, and was finally tested both in simulation and real traffic conditions. A comparison was performed against an optimal-control ILQR-based model predictive controller was carried out to further showcase the validity of this approach
Recommended from our members
State-of-the-art on research and applications of machine learning in the building life cycle
Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science
On quantifying the value of simulation for training and evaluating robotic agents
Un problème récurrent dans le domaine de la robotique est la difficulté à reproduire les résultats et valider les affirmations faites par les scientifiques. Les expériences conduites en laboratoire donnent fréquemment des résultats propres à l'environnement dans lequel elles ont été effectuées, rendant la tâche de les reproduire et de les valider ardues et coûteuses. Pour cette raison, il est difficile de comparer la performance et la robustesse de différents contrôleurs robotiques. Les environnements substituts à faibles coûts sont populaires, mais introduisent une réduction de performance lorsque l'environnement cible est enfin utilisé. Ce mémoire présente nos travaux sur l'amélioration des références et de la comparaison d'algorithmes (``Benchmarking'') en robotique, notamment dans le domaine de la conduite autonome.
Nous présentons une nouvelle platforme, les Autolabs Duckietown, qui permet aux chercheurs d'évaluer des algorithmes de conduite autonome sur des tâches, du matériel et un environnement standardisé à faible coût. La plateforme offre également un environnement virtuel afin d'avoir facilement accès à une quantité illimitée de données annotées. Nous utilisons la plateforme pour analyser les différences entre la simulation et la réalité en ce qui concerne la prédictivité de la simulation ainsi que la qualité des images générées. Nous fournissons deux métriques pour quantifier l'utilité d'une simulation et nous démontrons de quelles façons elles peuvent être utilisées afin d'optimiser un environnement proxy.A common problem in robotics is reproducing results and claims made by researchers. The experiments done in robotics laboratories typically yield results that are specific to a complex setup and difficult or costly to reproduce and validate in other contexts. For this reason, it is arduous to compare the performance and robustness of various robotic controllers. Low-cost reproductions of physical environments are popular but induce a performance reduction when transferred to the target domain. This thesis present the results of our work toward improving benchmarking in robotics, specifically for autonomous driving.
We build a new platform, the Duckietown Autolabs, which allow researchers to evaluate autonomous driving algorithms in a standardized framework on low-cost hardware. The platform offers a simulated environment for easy access to annotated data and parallel evaluation of driving solutions in customizable environments. We use the platform to analyze the discrepancy between simulation and reality in the case of predictivity and quality of data generated. We supply two metrics to quantify the usefulness of a simulation and demonstrate how they can be used to optimize the value of a proxy environment
Fuzzy Ensembles of Reinforcement Learning Policies for Robotic Systems with Varied Parameters
Reinforcement Learning (RL) is an emerging approach to control many dynamical
systems for which classical control approaches are not applicable or
insufficient. However, the resultant policies may not generalize to variations
in the parameters that the system may exhibit. This paper presents a powerful
yet simple algorithm in which collaboration is facilitated between RL agents
that are trained independently to perform the same task but with different
system parameters. The independency among agents allows the exploitation of
multi-core processing to perform parallel training. Two examples are provided
to demonstrate the effectiveness of the proposed technique. The main
demonstration is performed on a quadrotor with slung load tracking problem in a
real-time experimental setup. It is shown that integrating the developed
algorithm outperforms individual policies by reducing the RMSE tracking error.
The robustness of the ensemble is also verified against wind disturbance.Comment: arXiv admin note: text overlap with arXiv:2311.0501
- …