624 research outputs found
Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems
Many modern nonlinear control methods aim to endow systems with guaranteed
properties, such as stability or safety, and have been successfully applied to
the domain of robotics. However, model uncertainty remains a persistent
challenge, weakening theoretical guarantees and causing implementation failures
on physical systems. This paper develops a machine learning framework centered
around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and
unmodeled dynamics in general robotic systems. Our proposed method proceeds by
iteratively updating estimates of Lyapunov function derivatives and improving
controllers, ultimately yielding a stabilizing quadratic program model-based
controller. We validate our approach on a planar Segway simulation,
demonstrating substantial performance improvements by iteratively refining on a
base model-free controller
Bayesian Nonparametric Feature and Policy Learning for Decision-Making
Learning from demonstrations has gained increasing interest in the recent
past, enabling an agent to learn how to make decisions by observing an
experienced teacher. While many approaches have been proposed to solve this
problem, there is only little work that focuses on reasoning about the observed
behavior. We assume that, in many practical problems, an agent makes its
decision based on latent features, indicating a certain action. Therefore, we
propose a generative model for the states and actions. Inference reveals the
number of features, the features, and the policies, allowing us to learn and to
analyze the underlying structure of the observed behavior. Further, our
approach enables prediction of actions for new states. Simulations are used to
assess the performance of the algorithm based upon this model. Moreover, the
problem of learning a driver's behavior is investigated, demonstrating the
performance of the proposed model in a real-world scenario
Task-Driven Dictionary Learning
Modeling data with linear combinations of a few elements from a learned
dictionary has been the focus of much recent research in machine learning,
neuroscience and signal processing. For signals such as natural images that
admit such sparse representations, it is now well established that these models
are well suited to restoration tasks. In this context, learning the dictionary
amounts to solving a large-scale matrix factorization problem, which can be
done efficiently with classical optimization tools. The same approach has also
been used for learning features from data for other purposes, e.g., image
classification, but tuning the dictionary in a supervised way for these tasks
has proven to be more difficult. In this paper, we present a general
formulation for supervised dictionary learning adapted to a wide variety of
tasks, and present an efficient algorithm for solving the corresponding
optimization problem. Experiments on handwritten digit classification, digital
art identification, nonlinear inverse image problems, and compressed sensing
demonstrate that our approach is effective in large-scale settings, and is well
suited to supervised and semi-supervised classification, as well as regression
tasks for data that admit sparse representations.Comment: final draft post-refereein
Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies
RoboCup soccer competitions are considered among the most challenging
multi-robot adversarial environments, due to their high dynamism and the
partial observability of the environment. In this paper we introduce a method
based on a combination of Monte Carlo search and data aggregation (MCSDA) to
adapt discrete-action soccer policies for a defender robot to the strategy of
the opponent team. By exploiting a simple representation of the domain, a
supervised learning algorithm is trained over an initial collection of data
consisting of several simulations of human expert policies. Monte Carlo policy
rollouts are then generated and aggregated to previous data to improve the
learned policy over multiple epochs and games. The proposed approach has been
extensively tested both on a soccer-dedicated simulator and on real robots.
Using this method, our learning robot soccer team achieves an improvement in
ball interceptions, as well as a reduction in the number of opponents' goals.
Together with a better performance, an overall more efficient positioning of
the whole team within the field is achieved
Recommended from our members
New learning modes for sequential decision making
This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy
learning algorithms o er teachers little flexibility in how policies are taught. In particular,
one of two learning modes is typically considered: 1) Imitation learning, where
the teacher demonstrates explicit action sequences to the learner, and 2) Reinforcement
learning, where the teacher designs a reward function for the learner to autonomously
optimize via practice. This is in sharp contrast to how humans teach other humans,
where many other learning modes are commonly used besides imitation and practice.
This thesis presents novel learning modes for teaching policies to computer agents, with
the eventual aim of allowing human teachers to teach computer agents more naturally
and efficiently.
Our first learning mode is inspired by how humans learn: through rounds of practice
followed by feedback from a teacher. We adopt this mode to create computer agents that
learn from several rounds of autonomous practice followed by critique feedback from a
teacher. Our results show that this mode of policy learning is more e effective than pure
reinforcement learning, though important usability issues arise when used with human teachers.
Next we consider a learning mode where the computer agent can actively ask questions
to the teacher, which we call active imitation learning. We provide algorithms
for active imitation learning that are proven to require strictly less interaction with the
teacher than passive imitation learning. We also show that empirically active imitation learning algorithms are much more efficient than traditional passive imitation learning in terms of amount of interaction with the teacher.
Lastly, we introduce a novel imitation learning mode that allows a teacher to specify
shaping rewards to a computer agent in addition to demonstrations. Shaping rewards are
additional rewards supplied to an agent for accelerating policy learning via reinforcement
learning. We provide an algorithm to incorporate shaping rewards in imitation learning
and show that it learns from fewer demonstrations than pure imitation learning.
We wrap up by presenting a prototype User-Initiated Learning (UIL) system that
allows an end user to demonstrate procedures containing optional steps and instruct the
system to autonomously learn to predict when the optional steps should be executed, and
remind the user if they forget. Our prototype supports user-initiated demonstration and
learning via a natural interface, and has a built-in automated machine learning engine
to automatically train and install a predictor for the requested prediction problem
- …