1,086 research outputs found
Curious Replay for Model-based Adaptation
Agents must be able to adapt quickly as an environment changes. We find that
existing model-based reinforcement learning agents are unable to do this well,
in part because of how they use past experiences to train their world model.
Here, we present Curious Replay -- a form of prioritized experience replay
tailored to model-based agents through use of a curiosity-based priority
signal. Agents using Curious Replay exhibit improved performance in an
exploration paradigm inspired by animal behavior and on the Crafter benchmark.
DreamerV3 with Curious Replay surpasses state-of-the-art performance on
Crafter, achieving a mean score of 19.4 that substantially improves on the
previous high score of 14.5 by DreamerV3 with uniform replay, while also
maintaining similar performance on the Deepmind Control Suite. Code for Curious
Replay is available at https://github.com/AutonomousAgentsLab/curiousreplayComment: Accepted at ICML 2023. Website at
https://sites.google.com/view/curious-repla
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
In open-ended environments, autonomous learning agents must set their own
goals and build their own curriculum through an intrinsically motivated
exploration. They may consider a large diversity of goals, aiming to discover
what is controllable in their environments, and what is not. Because some goals
might prove easy and some impossible, agents must actively select which goal to
practice at any moment, to maximize their overall mastery on the set of
learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a
modular Universal Value Function Approximator with hindsight learning to
achieve a diversity of goals of different kinds within a unique policy and 2)
an automated curriculum learning mechanism that biases the attention of the
agent towards goals maximizing the absolute learning progress. Agents focus
sequentially on goals of increasing complexity, and focus back on goals that
are being forgotten. Experiments conducted in a new modular-goal robotic
environment show the resulting developmental self-organization of a learning
curriculum, and demonstrate properties of robustness to distracting goals,
forgetting and changes in body properties.Comment: Accepted at ICML 201
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different
formulations and approaches to continual reinforcement learning (RL), also
known as lifelong or non-stationary RL. We begin by discussing our perspective
on why RL is a natural fit for studying continual learning. We then provide a
taxonomy of different continual RL formulations and mathematically characterize
the non-stationary dynamics of each setting. We go on to discuss evaluation of
continual RL agents, providing an overview of benchmarks used in the literature
and important metrics for understanding agent performance. Finally, we
highlight open problems and challenges in bridging the gap between the current
state of continual RL and findings in neuroscience. While still in its early
days, the study of continual RL has the promise to develop better incremental
reinforcement learners that can function in increasingly realistic applications
where non-stationarity plays a vital role. These include applications such as
those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure
Data Optimization in Deep Learning: A Survey
Large-scale, high-quality data are considered an essential factor for the
successful application of many deep learning techniques. Meanwhile, numerous
real-world deep learning tasks still have to contend with the lack of
sufficient amounts of high-quality data. Additionally, issues such as model
robustness, fairness, and trustworthiness are also closely related to training
data. Consequently, a huge number of studies in the existing literature have
focused on the data aspect in deep learning tasks. Some typical data
optimization techniques include data augmentation, logit perturbation, sample
weighting, and data condensation. These techniques usually come from different
deep learning divisions and their theoretical inspirations or heuristic
motivations may seem unrelated to each other. This study aims to organize a
wide range of existing data optimization methodologies for deep learning from
the previous literature, and makes the effort to construct a comprehensive
taxonomy for them. The constructed taxonomy considers the diversity of split
dimensions, and deep sub-taxonomies are constructed for each dimension. On the
basis of the taxonomy, connections among the extensive data optimization
methods for deep learning are built in terms of four aspects. We probe into
rendering several promising and interesting future directions. The constructed
taxonomy and the revealed connections will enlighten the better understanding
of existing methods and the design of novel data optimization techniques.
Furthermore, our aspiration for this survey is to promote data optimization as
an independent subdivision of deep learning. A curated, up-to-date list of
resources related to data optimization in deep learning is available at
\url{https://github.com/YaoRujing/Data-Optimization}
DREAM Architecture: a Developmental Approach to Open-Ended Learning in Robotics
Robots are still limited to controlled conditions, that the robot designer
knows with enough details to endow the robot with the appropriate models or
behaviors. Learning algorithms add some flexibility with the ability to
discover the appropriate behavior given either some demonstrations or a reward
to guide its exploration with a reinforcement learning algorithm. Reinforcement
learning algorithms rely on the definition of state and action spaces that
define reachable behaviors. Their adaptation capability critically depends on
the representations of these spaces: small and discrete spaces result in fast
learning while large and continuous spaces are challenging and either require a
long training period or prevent the robot from converging to an appropriate
behavior. Beside the operational cycle of policy execution and the learning
cycle, which works at a slower time scale to acquire new policies, we introduce
the redescription cycle, a third cycle working at an even slower time scale to
generate or adapt the required representations to the robot, its environment
and the task. We introduce the challenges raised by this cycle and we present
DREAM (Deferred Restructuring of Experience in Autonomous Machines), a
developmental cognitive architecture to bootstrap this redescription process
stage by stage, build new state representations with appropriate motivations,
and transfer the acquired knowledge across domains or tasks or even across
robots. We describe results obtained so far with this approach and end up with
a discussion of the questions it raises in Neuroscience
Domain Generalization in Machine Learning Models for Wireless Communications: Concepts, State-of-the-Art, and Open Issues
Data-driven machine learning (ML) is promoted as one potential technology to
be used in next-generations wireless systems. This led to a large body of
research work that applies ML techniques to solve problems in different layers
of the wireless transmission link. However, most of these applications rely on
supervised learning which assumes that the source (training) and target (test)
data are independent and identically distributed (i.i.d). This assumption is
often violated in the real world due to domain or distribution shifts between
the source and the target data. Thus, it is important to ensure that these
algorithms generalize to out-of-distribution (OOD) data. In this context,
domain generalization (DG) tackles the OOD-related issues by learning models on
different and distinct source domains/datasets with generalization capabilities
to unseen new domains without additional finetuning. Motivated by the
importance of DG requirements for wireless applications, we present a
comprehensive overview of the recent developments in DG and the different
sources of domain shift. We also summarize the existing DG methods and review
their applications in selected wireless communication problems, and conclude
with insights and open questions
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning
Forgetting refers to the loss or deterioration of previously acquired
information or knowledge. While the existing surveys on forgetting have
primarily focused on continual learning, forgetting is a prevalent phenomenon
observed in various other research domains within deep learning. Forgetting
manifests in research fields such as generative models due to generator shifts,
and federated learning due to heterogeneous data distributions across clients.
Addressing forgetting encompasses several challenges, including balancing the
retention of old task knowledge with fast learning of new tasks, managing task
interference with conflicting goals, and preventing privacy leakage, etc.
Moreover, most existing surveys on continual learning implicitly assume that
forgetting is always harmful. In contrast, our survey argues that forgetting is
a double-edged sword and can be beneficial and desirable in certain cases, such
as privacy-preserving scenarios. By exploring forgetting in a broader context,
we aim to present a more nuanced understanding of this phenomenon and highlight
its potential advantages. Through this comprehensive survey, we aspire to
uncover potential solutions by drawing upon ideas and approaches from various
fields that have dealt with forgetting. By examining forgetting beyond its
conventional boundaries, in future work, we hope to encourage the development
of novel strategies for mitigating, harnessing, or even embracing forgetting in
real applications. A comprehensive list of papers about forgetting in various
research fields is available at
\url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}
- …