7 research outputs found
Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive Learning
Model-based reinforcement learning (MBRL) with real-time planning has shown
great potential in locomotion and manipulation control tasks. However, the
existing planning methods, such as the Cross-Entropy Method (CEM), do not scale
well to complex high-dimensional environments. One of the key reasons for
underperformance is the lack of exploration, as these planning methods only aim
to maximize the cumulative extrinsic reward over the planning horizon.
Furthermore, planning inside the compact latent space in the absence of
observations makes it challenging to use curiosity-based intrinsic motivation.
We propose Curiosity CEM (CCEM), an improved version of the CEM algorithm for
encouraging exploration via curiosity. Our proposed method maximizes the sum of
state-action Q values over the planning horizon, in which these Q values
estimate the future extrinsic and intrinsic reward, hence encouraging reaching
novel observations. In addition, our model uses contrastive representation
learning to efficiently learn latent representations. Experiments on
image-based continuous control tasks from the DeepMind Control suite show that
CCEM is by a large margin more sample-efficient than previous MBRL algorithms
and compares favorably with the best model-free RL methods.Comment: 7 pages, 4 figure
Locally Constrained Representations in Reinforcement Learning
The success of Reinforcement Learning (RL) heavily relies on the ability to
learn robust representations from the observations of the environment. In most
cases, the representations learned purely by the reinforcement learning loss
can differ vastly across states depending on how the value functions change.
However, the representations learned need not be very specific to the task at
hand. Relying only on the RL objective may yield representations that vary
greatly across successive time steps. In addition, since the RL loss has a
changing target, the representations learned would depend on how good the
current values/policies are. Thus, disentangling the representations from the
main task would allow them to focus more on capturing transition dynamics which
can improve generalization. To this end, we propose locally constrained
representations, where an auxiliary loss forces the state representations to be
predictable by the representations of the neighbouring states. This encourages
the representations to be driven not only by the value/policy learning but also
self-supervised learning, which constrains the representations from changing
too rapidly. We evaluate the proposed method on several known benchmarks and
observe strong performance. Especially in continuous control tasks, our
experiments show a significant advantage over a strong baseline
Learning Task Informed Abstractions
Current model-based reinforcement learning methods struggle when operating
from complex visual scenes due to their inability to prioritize task-relevant
features. To mitigate this problem, we propose learning Task Informed
Abstractions (TIA) that explicitly separates reward-correlated visual features
from distractors. For learning TIA, we introduce the formalism of Task Informed
MDP (TiMDP) that is realized by training two models that learn visual features
via cooperative reconstruction, but one model is adversarially dissociated from
the reward signal. Empirical evaluation shows that TIA leads to significant
performance gains over state-of-the-art methods on many visual control tasks
where natural and unconstrained visual distractions pose a formidable
challenge.Comment: 8 pages, 12 figure
Data-driven robotic manipulation of cloth-like deformable objects : the present, challenges and future prospects
Manipulating cloth-like deformable objects (CDOs) is a long-standing problem in the robotics community. CDOs are flexible (non-rigid) objects that do not show a detectable level of compression strength while two points on the article are pushed towards each other and include objects such as ropes (1D), fabrics (2D) and bags (3D). In general, CDOs’ many degrees of freedom (DoF) introduce severe self-occlusion and complex state–action dynamics as significant obstacles to perception and manipulation systems. These challenges exacerbate existing issues of modern robotic control methods such as imitation learning (IL) and reinforcement learning (RL). This review focuses on the application details of data-driven control methods on four major task families in this domain: cloth shaping, knot tying/untying, dressing and bag manipulation. Furthermore, we identify specific inductive biases in these four domains that present challenges for more general IL and RL algorithms.Publisher PDFPeer reviewe
Advanced deep active learning & data subset selection: unifying principles with information-theory intuitions
At its core, this thesis aims to enhance the practicality of deep learning by improving the label and training efficiency of deep learning models.
To this end, we investigate data subset selection techniques, specifically active learning and active sampling, grounded in information-theoretic principles.
Active learning improves label efficiency, while active sampling enhances training efficiency.
Supervised deep learning models often require extensive training with labeled data. Label acquisition can be expensive and time-consuming, and training large models is resource-intensive, hindering the adoption outside academic research and "big tech."
Existing methods for data subset selection in deep learning often rely on heuristics or lack a principled information-theoretic foundation. In contrast, this thesis examines several objectives for data subset selection and their applications within deep learning, striving for a more principled approach inspired by information theory.
We begin by disentangling epistemic and aleatoric uncertainty in single forward-pass deep neural networks, which provides helpful intuitions and insights into different forms of uncertainty and their relevance for data subset selection. We then propose and investigate various approaches for active learning and data subset selection in (Bayesian) deep learning. Finally, we relate various existing and proposed approaches to approximations of information quantities in weight or prediction space.
Underpinning this work is a principled and practical notation for information-theoretic quantities that includes both random variables and observed outcomes. This thesis demonstrates the benefits of working from a unified perspective and highlights the potential impact of our contributions to the practical application of deep learning