Search CORE

624 research outputs found

Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems

Author: Ames Aaron D.
Dorobantu Victor D.
Le Hoang M.
Taylor Andrew J.
Yue Yisong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/03/2019
Field of study

Many modern nonlinear control methods aim to endow systems with guaranteed properties, such as stability or safety, and have been successfully applied to the domain of robotics. However, model uncertainty remains a persistent challenge, weakening theoretical guarantees and causing implementation failures on physical systems. This paper develops a machine learning framework centered around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and unmodeled dynamics in general robotic systems. Our proposed method proceeds by iteratively updating estimates of Lyapunov function derivatives and improving controllers, ultimately yielding a stabilizing quadratic program model-based controller. We validate our approach on a planar Segway simulation, demonstrating substantial performance improvements by iteratively refining on a base model-free controller

arXiv.org e-Print Archive

Crossref

Caltech Authors

Bayesian Nonparametric Feature and Policy Learning for Decision-Making

Author: Hahn Jürgen
Zoubir Abdelhak M.
Publication venue
Publication date: 13/09/2016
Field of study

Learning from demonstrations has gained increasing interest in the recent past, enabling an agent to learn how to make decisions by observing an experienced teacher. While many approaches have been proposed to solve this problem, there is only little work that focuses on reasoning about the observed behavior. We assume that, in many practical problems, an agent makes its decision based on latent features, indicating a certain action. Therefore, we propose a generative model for the states and actions. Inference reveals the number of features, the features, and the policies, allowing us to learn and to analyze the underlying structure of the observed behavior. Further, our approach enables prediction of actions for new states. Simulations are used to assess the performance of the algorithm based upon this model. Moreover, the problem of learning a driver's behavior is investigated, demonstrating the performance of the proposed model in a real-world scenario

arXiv.org e-Print Archive

TUbiblio

Task-Driven Dictionary Learning

Author: Bach Francis
Mairal Julien
Ponce Jean
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a large-scale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in large-scale settings, and is well suited to supervised and semi-supervised classification, as well as regression tasks for data that admit sparse representations.Comment: final draft post-refereein

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies

Author: D Silver
K Han
K Yasui
P Riley
V Mnih
Publication venue
Publication date: 01/06/2016
Field of study

RoboCup soccer competitions are considered among the most challenging multi-robot adversarial environments, due to their high dynamism and the partial observability of the environment. In this paper we introduce a method based on a combination of Monte Carlo search and data aggregation (MCSDA) to adapt discrete-action soccer policies for a defender robot to the strategy of the opponent team. By exploiting a simple representation of the domain, a supervised learning algorithm is trained over an initial collection of data consisting of several simulations of human expert policies. Monte Carlo policy rollouts are then generated and aggregated to previous data to improve the learned policy over multiple epochs and games. The proposed approach has been extensively tested both on a soccer-dedicated simulator and on real robots. Using this method, our learning robot soccer team achieves an improvement in ball interceptions, as well as a reduction in the number of opponents' goals. Together with a better performance, an overall more efficient positioning of the whole team within the field is achieved

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Recommended from our members

New learning modes for sequential decision making

Author: Judah Kshitij
Publication venue: 'Oregon State University'
Publication date
Field of study

This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy learning algorithms o er teachers little flexibility in how policies are taught. In particular, one of two learning modes is typically considered: 1) Imitation learning, where the teacher demonstrates explicit action sequences to the learner, and 2) Reinforcement learning, where the teacher designs a reward function for the learner to autonomously optimize via practice. This is in sharp contrast to how humans teach other humans, where many other learning modes are commonly used besides imitation and practice. This thesis presents novel learning modes for teaching policies to computer agents, with the eventual aim of allowing human teachers to teach computer agents more naturally and efficiently. Our first learning mode is inspired by how humans learn: through rounds of practice followed by feedback from a teacher. We adopt this mode to create computer agents that learn from several rounds of autonomous practice followed by critique feedback from a teacher. Our results show that this mode of policy learning is more e effective than pure reinforcement learning, though important usability issues arise when used with human teachers. Next we consider a learning mode where the computer agent can actively ask questions to the teacher, which we call active imitation learning. We provide algorithms for active imitation learning that are proven to require strictly less interaction with the teacher than passive imitation learning. We also show that empirically active imitation learning algorithms are much more efficient than traditional passive imitation learning in terms of amount of interaction with the teacher. Lastly, we introduce a novel imitation learning mode that allows a teacher to specify shaping rewards to a computer agent in addition to demonstrations. Shaping rewards are additional rewards supplied to an agent for accelerating policy learning via reinforcement learning. We provide an algorithm to incorporate shaping rewards in imitation learning and show that it learns from fewer demonstrations than pure imitation learning. We wrap up by presenting a prototype User-Initiated Learning (UIL) system that allows an end user to demonstrate procedures containing optional steps and instruct the system to autonomously learn to predict when the optional steps should be executed, and remind the user if they forget. Our prototype supports user-initiated demonstration and learning via a natural interface, and has a built-in automated machine learning engine to automatically train and install a predictor for the requested prediction problem

ScholarsArchive@OSU