Search CORE

3,478 research outputs found

Recommended from our members

Chapter 2 - Data-Driven Energy Efficient Driving Control in Connected Vehicle Environment

Author: Barth Matthew J
Boriboonsomsin Kanok
Qi Xuewei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

eScholarship - University of California

Learning Automata: a Survey

Author: K. Narendra
M. Thatcher
Publication venue
Publication date
Field of study

Research Papers in Economics

Control Regularization for Reduced Variance Reinforcement Learning

Author: Burdick Joel W.
Chaudhuri Swarat
Cheng Richard
Orosz Gabor
Verma Abhinav
Yue Yisong
Publication venue
Publication date: 13/05/2019
Field of study

Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.Comment: Appearing in ICML 201

arXiv.org e-Print Archive

Caltech Authors

Correct-by-synthesis reinforcement learning with temporal logic constraints

Author: Ehlers Ruediger
Topcu Ufuk
Wen Min
Publication venue
Publication date: 05/03/2015
Field of study

We consider a problem on the synthesis of reactive controllers that optimize some a priori unknown performance criterion while interacting with an uncontrolled environment such that the system satisfies a given temporal logic specification. We decouple the problem into two subproblems. First, we extract a (maximally) permissive strategy for the system, which encodes multiple (possibly all) ways in which the system can react to the adversarial environment and satisfy the specifications. Then, we quantify the a priori unknown performance criterion as a (still unknown) reward function and compute an optimal strategy for the system within the operating envelope allowed by the permissive strategy by using the so-called maximin-Q learning algorithm. We establish both correctness (with respect to the temporal logic specifications) and optimality (with respect to the a priori unknown performance criterion) of this two-step technique for a fragment of temporal logic specifications. For specifications beyond this fragment, correctness can still be preserved, but the learned strategy may be sub-optimal. We present an algorithm to the overall problem, and demonstrate its use and computational requirements on a set of robot motion planning examples.Comment: 8 pages, 3 figures, 2 tables, submitted to IROS 201

arXiv.org e-Print Archive

Crossref