104,609 research outputs found
Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles
Learning complex robot behaviors through interaction requires structured
exploration. Planning should target interactions with the potential to optimize
long-term performance, while only reducing uncertainty where conducive to this
objective. This paper presents Latent Optimistic Value Exploration (LOVE), a
strategy that enables deep exploration through optimism in the face of
uncertain long-term rewards. We combine latent world models with value function
estimation to predict infinite-horizon returns and recover associated
uncertainty via ensembling. The policy is then trained on an upper confidence
bound (UCB) objective to identify and select the interactions most promising to
improve long-term performance. We apply LOVE to visual robot control tasks in
continuous action spaces and demonstrate on average more than 20% improved
sample efficiency in comparison to state-of-the-art and other exploration
objectives. In sparse and hard to explore environments we achieve an average
improvement of over 30%
Uncertainty-Aware Data Aggregation for Deep Imitation Learning
Estimating statistical uncertainties allows autonomous agents to communicate
their confidence during task execution and is important for applications in
safety-critical domains such as autonomous driving. In this work, we present
the uncertainty-aware imitation learning (UAIL) algorithm for improving
end-to-end control systems via data aggregation. UAIL applies Monte Carlo
Dropout to estimate uncertainty in the control output of end-to-end systems,
using states where it is uncertain to selectively acquire new training data. In
contrast to prior data aggregation algorithms that force human experts to visit
sub-optimal states at random, UAIL can anticipate its own mistakes and switch
control to the expert in order to prevent visiting a series of sub-optimal
states. Our experimental results from simulated driving tasks demonstrate that
our proposed uncertainty estimation method can be leveraged to reliably predict
infractions. Our analysis shows that UAIL outperforms existing data aggregation
algorithms on a series of benchmark tasks.Comment: Accepted to International Conference on Robotics and Automation 201
Constrained Exploration and Recovery from Experience Shaping
We consider the problem of reinforcement learning under safety requirements,
in which an agent is trained to complete a given task, typically formalized as
the maximization of a reward signal over time, while concurrently avoiding
undesirable actions or states, associated to lower rewards, or penalties. The
construction and balancing of different reward components can be difficult in
the presence of multiple objectives, yet is crucial for producing a satisfying
policy. For example, in reaching a target while avoiding obstacles, low
collision penalties can lead to reckless movements while high penalties can
discourage exploration. To circumvent this limitation, we examine the effect of
past actions in terms of safety to estimate which are acceptable or should be
avoided in the future. We then actively reshape the action space of the agent
during reinforcement learning, so that reward-driven exploration is constrained
within safety limits. We propose an algorithm enabling the learning of such
safety constraints in parallel with reinforcement learning and demonstrate its
effectiveness in terms of both task completion and training time.Comment: Code: https://github.com/IBM/constrained-r
Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning
Temporal observations such as videos contain essential information about the
dynamics of the underlying scene, but they are often interleaved with
inessential, predictable details. One way of dealing with this problem is by
focusing on the most informative moments in a sequence. We propose a model that
learns to discover these important events and the times when they occur and
uses them to represent the full sequence. We do so using a hierarchical
Keyframe-Inpainter (KeyIn) model that first generates a video's keyframes and
then inpaints the rest by generating the frames at the intervening times. We
propose a fully differentiable formulation to efficiently learn this procedure.
We show that KeyIn finds informative keyframes in several datasets with
different dynamics and visual properties. KeyIn outperforms other recent
hierarchical predictive models for planning. For more details, please see the
project website at \url{https://sites.google.com/view/keyin}.Comment: Conference on Learning for Dynamics and Control, 2020. Website:
https://sites.google.com/view/keyin/hom
AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild
Automated affective computing in the wild setting is a challenging problem in
computer vision. Existing annotated databases of facial expressions in the wild
are small and mostly cover discrete emotions (aka the categorical model). There
are very limited annotated facial databases for affective computing in the
continuous dimensional model (e.g., valence and arousal). To meet this need, we
collected, annotated, and prepared for public distribution a new database of
facial emotions in the wild (called AffectNet). AffectNet contains more than
1,000,000 facial images from the Internet by querying three major search
engines using 1250 emotion related keywords in six different languages. About
half of the retrieved images were manually annotated for the presence of seven
discrete facial expressions and the intensity of valence and arousal. AffectNet
is by far the largest database of facial expression, valence, and arousal in
the wild enabling research in automated facial expression recognition in two
different emotion models. Two baseline deep neural networks are used to
classify images in the categorical model and predict the intensity of valence
and arousal. Various evaluation metrics show that our deep neural network
baselines can perform better than conventional machine learning methods and
off-the-shelf facial expression recognition systems.Comment: IEEE Transactions on Affective Computing, 201
Boosting Cloud Data Analytics using Multi-Objective Optimization
Data analytics in the cloud has become an integral part of enterprise
businesses. Big data analytics systems, however, still lack the ability to take
user performance goals and budgetary constraints for a task, collectively
referred to as task objectives, and automatically configure an analytic job to
achieve these objectives. This paper presents a data analytics optimizer that
can automatically determine a cluster configuration with a suitable number of
cores as well as other system parameters that best meet the task objectives. At
a core of our work is a principled multi-objective optimization (MOO) approach
that computes a Pareto optimal set of job configurations to reveal tradeoffs
between different user objectives, recommends a new job configuration that best
explores such tradeoffs, and employs novel optimizations to enable such
recommendations within a few seconds. We present efficient incremental
algorithms based on the notion of a Progressive Frontier for realizing our MOO
approach and implement them into a Spark-based prototype. Detailed experiments
using benchmark workloads show that our MOO techniques provide a 2-50x speedup
over existing MOO methods, while offering good coverage of the Pareto frontier.
When compared to Ottertune, a state-of-the-art performance tuning system, our
approach recommends configurations that yield 26\%-49\% reduction of running
time of the TPCx-BB benchmark while adapting to different application
preferences on multiple objectives
Learning to Imagine Manipulation Goals for Robot Task Planning
Prospection is an important part of how humans come up with new task plans,
but has not been explored in depth in robotics. Predicting multiple task-level
is a challenging problem that involves capturing both task semantics and
continuous variability over the state of the world. Ideally, we would combine
the ability of machine learning to leverage big data for learning the semantics
of a task, while using techniques from task planning to reliably generalize to
new environment. In this work, we propose a method for learning a model
encoding just such a representation for task planning. We learn a neural net
that encodes the most likely outcomes from high level actions from a given
world. Our approach creates comprehensible task plans that allow us to predict
changes to the environment many time steps into the future. We demonstrate this
approach via application to a stacking task in a cluttered environment, where
the robot must select between different colored blocks while avoiding
obstacles, in order to perform a task. We also show results on a simple
navigation task. Our algorithm generates realistic image and pose predictions
at multiple points in a given task
Constrained Structured Regression with Convolutional Neural Networks
Convolutional Neural Networks (CNNs) have recently emerged as the dominant
model in computer vision. If provided with enough training data, they predict
almost any visual quantity. In a discrete setting, such as classification, CNNs
are not only able to predict a label but often predict a confidence in the form
of a probability distribution over the output space. In continuous regression
tasks, such a probability estimate is often lacking. We present a regression
framework which models the output distribution of neural networks. This output
distribution allows us to infer the most likely labeling following a set of
physical or modeling constraints. These constraints capture the intricate
interplay between different input and output variables, and complement the
output of a CNN. However, they may not hold everywhere. Our setup further
allows to learn a confidence with which a constraint holds, in the form of a
distribution of the constrain satisfaction. We evaluate our approach on the
problem of intrinsic image decomposition, and show that constrained structured
regression significantly increases the state-of-the-art
Active Image Synthesis for Efficient Labeling
The great success achieved by deep neural networks attracts increasing
attention from the manufacturing and healthcare communities. However, the
limited availability of data and high costs of data collection are the major
challenges for the applications in those fields. We propose in this work AISEL,
an active image synthesis method for efficient labeling to improve the
performance of the small-data learning tasks. Specifically, a complementary
AISEL dataset is generated, with labels actively acquired via a physics-based
method to incorporate underlining physical knowledge at hand. An important
component of our AISEL method is the bidirectional generative invertible
network (GIN), which can extract interpretable features from the training
images and generate physically meaningful virtual images. Our AISEL method then
efficiently samples virtual images not only further exploits the uncertain
regions, but also explores the entire image space. We then discuss the
interpretability of GIN both theoretically and experimentally, demonstrating
clear visual improvements over the benchmarks. Finally, we demonstrate the
effectiveness of our AISEL framework on aortic stenosis application, in which
our method lower the labeling cost by while achieving a
improvement in prediction accuracy
Data-Efficient Reinforcement Learning in Continuous-State POMDPs
We present a data-efficient reinforcement learning algorithm resistant to
observation noise. Our method extends the highly data-efficient PILCO algorithm
(Deisenroth & Rasmussen, 2011) into partially observed Markov decision
processes (POMDPs) by considering the filtering process during policy
evaluation. PILCO conducts policy search, evaluating each policy by first
predicting an analytic distribution of possible system trajectories. We
additionally predict trajectories w.r.t. a filtering process, achieving
significantly higher performance than combining a filter with a policy
optimised by the original (unfiltered) framework. Our test setup is the
cartpole swing-up task with sensor noise, which involves nonlinear dynamics and
requires nonlinear control
- …