14,352 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Sufficient Conditions for Feasibility and Optimality of Real-Time Optimization Schemes - I. Theoretical Foundations
The idea of iterative process optimization based on collected output
measurements, or "real-time optimization" (RTO), has gained much prominence in
recent decades, with many RTO algorithms being proposed, researched, and
developed. While the essential goal of these schemes is to drive the process to
its true optimal conditions without violating any safety-critical, or "hard",
constraints, no generalized, unified approach for guaranteeing this behavior
exists. In this two-part paper, we propose an implementable set of conditions
that can enforce these properties for any RTO algorithm. The first part of the
work is dedicated to the theory behind the sufficient conditions for
feasibility and optimality (SCFO), together with their basic implementation
strategy. RTO algorithms enforcing the SCFO are shown to perform as desired in
several numerical examples - allowing for feasible-side convergence to the
plant optimum where algorithms not enforcing the conditions would fail.Comment: Working paper; supplementary material available at:
http://infoscience.epfl.ch/record/18807
Coordinate Descent with Bandit Sampling
Coordinate descent methods usually minimize a cost function by updating a
random decision variable (corresponding to one coordinate) at a time. Ideally,
we would update the decision variable that yields the largest decrease in the
cost function. However, finding this coordinate would require checking all of
them, which would effectively negate the improvement in computational
tractability that coordinate descent is intended to afford. To address this, we
propose a new adaptive method for selecting a coordinate. First, we find a
lower bound on the amount the cost function decreases when a coordinate is
updated. We then use a multi-armed bandit algorithm to learn which coordinates
result in the largest lower bound by interleaving this learning with
conventional coordinate descent updates except that the coordinate is selected
proportionately to the expected decrease. We show that our approach improves
the convergence of coordinate descent methods both theoretically and
experimentally.Comment: appearing at NeurIPS 201
- …