Search CORE

14,352 research outputs found

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Sufficient Conditions for Feasibility and Optimality of Real-Time Optimization Schemes - I. Theoretical Foundations

Author: Bonvin Dominique
Bunin Gene A.
François Grégory
Publication venue
Publication date: 13/08/2013
Field of study

The idea of iterative process optimization based on collected output measurements, or "real-time optimization" (RTO), has gained much prominence in recent decades, with many RTO algorithms being proposed, researched, and developed. While the essential goal of these schemes is to drive the process to its true optimal conditions without violating any safety-critical, or "hard", constraints, no generalized, unified approach for guaranteeing this behavior exists. In this two-part paper, we propose an implementable set of conditions that can enforce these properties for any RTO algorithm. The first part of the work is dedicated to the theory behind the sufficient conditions for feasibility and optimality (SCFO), together with their basic implementation strategy. RTO algorithms enforcing the SCFO are shown to perform as desired in several numerical examples - allowing for feasible-side convergence to the plant optimum where algorithms not enforcing the conditions would fail.Comment: Working paper; supplementary material available at: http://infoscience.epfl.ch/record/18807

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Coordinate Descent with Bandit Sampling

Author: Celis L. Elisa
Salehi Farnood
Thiran Patrick
Publication venue
Publication date: 04/12/2018
Field of study

Coordinate descent methods usually minimize a cost function by updating a random decision variable (corresponding to one coordinate) at a time. Ideally, we would update the decision variable that yields the largest decrease in the cost function. However, finding this coordinate would require checking all of them, which would effectively negate the improvement in computational tractability that coordinate descent is intended to afford. To address this, we propose a new adaptive method for selecting a coordinate. First, we find a lower bound on the amount the cost function decreases when a coordinate is updated. We then use a multi-armed bandit algorithm to learn which coordinates result in the largest lower bound by interleaving this learning with conventional coordinate descent updates except that the coordinate is selected proportionately to the expected decrease. We show that our approach improves the convergence of coordinate descent methods both theoretically and experimentally.Comment: appearing at NeurIPS 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne