14,352 research outputs found

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    Sufficient Conditions for Feasibility and Optimality of Real-Time Optimization Schemes - I. Theoretical Foundations

    Get PDF
    The idea of iterative process optimization based on collected output measurements, or "real-time optimization" (RTO), has gained much prominence in recent decades, with many RTO algorithms being proposed, researched, and developed. While the essential goal of these schemes is to drive the process to its true optimal conditions without violating any safety-critical, or "hard", constraints, no generalized, unified approach for guaranteeing this behavior exists. In this two-part paper, we propose an implementable set of conditions that can enforce these properties for any RTO algorithm. The first part of the work is dedicated to the theory behind the sufficient conditions for feasibility and optimality (SCFO), together with their basic implementation strategy. RTO algorithms enforcing the SCFO are shown to perform as desired in several numerical examples - allowing for feasible-side convergence to the plant optimum where algorithms not enforcing the conditions would fail.Comment: Working paper; supplementary material available at: http://infoscience.epfl.ch/record/18807

    Coordinate Descent with Bandit Sampling

    Full text link
    Coordinate descent methods usually minimize a cost function by updating a random decision variable (corresponding to one coordinate) at a time. Ideally, we would update the decision variable that yields the largest decrease in the cost function. However, finding this coordinate would require checking all of them, which would effectively negate the improvement in computational tractability that coordinate descent is intended to afford. To address this, we propose a new adaptive method for selecting a coordinate. First, we find a lower bound on the amount the cost function decreases when a coordinate is updated. We then use a multi-armed bandit algorithm to learn which coordinates result in the largest lower bound by interleaving this learning with conventional coordinate descent updates except that the coordinate is selected proportionately to the expected decrease. We show that our approach improves the convergence of coordinate descent methods both theoretically and experimentally.Comment: appearing at NeurIPS 201
    • …
    corecore