20,321 research outputs found
Attention-Privileged Reinforcement Learning
Image-based Reinforcement Learning is known to suffer from poor sample
efficiency and generalisation to unseen visuals such as distractors
(task-independent aspects of the observation space). Visual domain
randomisation encourages transfer by training over visual factors of variation
that may be encountered in the target domain. This increases learning
complexity, can negatively impact learning rate and performance, and requires
knowledge of potential variations during deployment. In this paper, we
introduce Attention-Privileged Reinforcement Learning (APRiL) which uses a
self-supervised attention mechanism to significantly alleviate these drawbacks:
by focusing on task-relevant aspects of the observations, attention provides
robustness to distractors as well as significantly increased learning
efficiency. APRiL trains two attention-augmented actor-critic agents: one
purely based on image observations, available across training and transfer
domains; and one with access to privileged information (such as environment
states) available only during training. Experience is shared between both
agents and their attention mechanisms are aligned. The image-based policy can
then be deployed without access to privileged information. We experimentally
demonstrate accelerated and more robust learning on a diverse set of domains,
leading to improved final performance for environments both within and outside
the training distribution.Comment: Published at Conference on Robot Learning (CoRL) 202
Sim-to-Real Transfer for Quadrupedal Locomotion via Terrain Transformer
Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines
TraKDis: A Transformer-based Knowledge Distillation Approach for Visual Reinforcement Learning with Application to Cloth Manipulation
Approaching robotic cloth manipulation using reinforcement learning based on
visual feedback is appealing as robot perception and control can be learned
simultaneously. However, major challenges result due to the intricate dynamics
of cloth and the high dimensionality of the corresponding states, what shadows
the practicality of the idea. To tackle these issues, we propose TraKDis, a
novel Transformer-based Knowledge Distillation approach that decomposes the
visual reinforcement learning problem into two distinct stages. In the first
stage, a privileged agent is trained, which possesses complete knowledge of the
cloth state information. This privileged agent acts as a teacher, providing
valuable guidance and training signals for subsequent stages. The second stage
involves a knowledge distillation procedure, where the knowledge acquired by
the privileged agent is transferred to a vision-based agent by leveraging
pre-trained state estimation and weight initialization. TraKDis demonstrates
better performance when compared to state-of-the-art RL techniques, showing a
higher performance of 21.9%, 13.8%, and 8.3% in cloth folding tasks in
simulation. Furthermore, to validate robustness, we evaluate the agent in a
noisy environment; the results indicate its ability to handle and adapt to
environmental uncertainties effectively. Real robot experiments are also
conducted to showcase the efficiency of our method in real-world scenarios.Comment: Accepted for IEEE Robotics and Automation Letters in January 202
Recommended from our members
Learning models for semantic classification of insufficient plantar pressure images
Establishing a reliable and stable model to predict a target by using insufficient labeled samples is feasible and
effective, particularly, for a sensor-generated data-set. This paper has been inspired with insufficient data-set
learning algorithms, such as metric-based, prototype networks and meta-learning, and therefore we propose
an insufficient data-set transfer model learning method. Firstly, two basic models for transfer learning are
introduced. A classification system and calculation criteria are then subsequently introduced. Secondly, a dataset
of plantar pressure for comfort shoe design is acquired and preprocessed through foot scan system; and by
using a pre-trained convolution neural network employing AlexNet and convolution neural network (CNN)-
based transfer modeling, the classification accuracy of the plantar pressure images is over 93.5%. Finally,
the proposed method has been compared to the current classifiers VGG, ResNet, AlexNet and pre-trained
CNN. Also, our work is compared with known-scaling and shifting (SS) and unknown-plain slot (PS) partition
methods on the public test databases: SUN, CUB, AWA1, AWA2, and aPY with indices of precision (tr, ts, H)
and time (training and evaluation). The proposed method for the plantar pressure classification task shows high
performance in most indices when comparing with other methods. The transfer learning-based method can be
applied to other insufficient data-sets of sensor imaging fields
End-to-end Autonomous Driving: Challenges and Frontiers
The autonomous driving community has witnessed a rapid growth in approaches
that embrace an end-to-end algorithm framework, utilizing raw sensor input to
generate vehicle motion plans, instead of concentrating on individual tasks
such as detection and motion prediction. End-to-end systems, in comparison to
modular pipelines, benefit from joint feature optimization for perception and
planning. This field has flourished due to the availability of large-scale
datasets, closed-loop evaluation, and the increasing need for autonomous
driving algorithms to perform effectively in challenging scenarios. In this
survey, we provide a comprehensive analysis of more than 250 papers, covering
the motivation, roadmap, methodology, challenges, and future trends in
end-to-end autonomous driving. We delve into several critical challenges,
including multi-modality, interpretability, causal confusion, robustness, and
world models, amongst others. Additionally, we discuss current advancements in
foundation models and visual pre-training, as well as how to incorporate these
techniques within the end-to-end driving framework. To facilitate future
research, we maintain an active repository that contains up-to-date links to
relevant literature and open-source projects at
https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving
Teacher-Student Reinforcement Learning for Mapless Navigation using a Planetary Space Rover
We address the challenge of enhancing navigation autonomy for planetary space
rovers using reinforcement learning (RL). The ambition of future space missions
necessitates advanced autonomous navigation capabilities for rovers to meet
mission objectives. RL's potential in robotic autonomy is evident, but its
reliance on simulations poses a challenge. Transferring policies to real-world
scenarios often encounters the "reality gap", disrupting the transition from
virtual to physical environments. The reality gap is exacerbated in the context
of mapless navigation on Mars and Moon-like terrains, where unpredictable
terrains and environmental factors play a significant role. Effective
navigation requires a method attuned to these complexities and real-world data
noise. We introduce a novel two-stage RL approach using offline noisy data. Our
approach employs a teacher-student policy learning paradigm, inspired by the
"learning by cheating" method. The teacher policy is trained in simulation.
Subsequently, the student policy is trained on noisy data, aiming to mimic the
teacher's behaviors while being more robust to real-world uncertainties. Our
policies are transferred to a custom-designed rover for real-world testing.
Comparative analyses between the teacher and student policies reveal that our
approach offers improved behavioral performance, heightened noise resilience,
and more effective sim-to-real transfer
Learning by Asking Questions
We introduce an interactive learning framework for the development and
testing of intelligent visual systems, called learning-by-asking (LBA). We
explore LBA in context of the Visual Question Answering (VQA) task. LBA differs
from standard VQA training in that most questions are not observed during
training time, and the learner must ask questions it wants answers to. Thus,
LBA more closely mimics natural learning and has the potential to be more
data-efficient than the traditional VQA setting. We present a model that
performs LBA on the CLEVR dataset, and show that it automatically discovers an
easy-to-hard curriculum when learning interactively from an oracle. Our LBA
generated data consistently matches or outperforms the CLEVR train data and is
more sample efficient. We also show that our model asks questions that
generalize to state-of-the-art VQA models and to novel test time distributions
- …