115 research outputs found

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    A Survey on Quantum Reinforcement Learning

    Full text link
    Quantum reinforcement learning is an emerging field at the intersection of quantum computing and machine learning. While we intend to provide a broad overview of the literature on quantum reinforcement learning (our interpretation of this term will be clarified below), we put particular emphasis on recent developments. With a focus on already available noisy intermediate-scale quantum devices, these include variational quantum circuits acting as function approximators in an otherwise classical reinforcement learning setting. In addition, we survey quantum reinforcement learning algorithms based on future fault-tolerant hardware, some of which come with a provable quantum advantage. We provide both a birds-eye-view of the field, as well as summaries and reviews for selected parts of the literature.Comment: 62 pages, 16 figure

    Pessimistic Off-Policy Multi-Objective Optimization

    Full text link
    Multi-objective optimization is a type of decision making problems where multiple conflicting objectives are optimized. We study offline optimization of multi-objective policies from data collected by an existing policy. We propose a pessimistic estimator for the multi-objective policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them. The pessimistic estimator can be optimized by policy gradients and performs well in all of our experiments

    When can dictionary learning uniquely recover sparse data from subsamples?

    Full text link
    Sparse coding or sparse dictionary learning has been widely used to recover underlying structure in many kinds of natural data. Here, we provide conditions guaranteeing when this recovery is universal; that is, when sparse codes and dictionaries are unique (up to natural symmetries). Our main tool is a useful lemma in combinatorial matrix theory that allows us to derive bounds on the sample sizes guaranteeing such uniqueness under various assumptions for how training data are generated. Whenever the conditions to one of our theorems are met, any sparsity-constrained learning algorithm that succeeds in reconstructing the data recovers the original sparse codes and dictionary. We also discuss potential applications to neuroscience and data analysis.Comment: 8 pages, 1 figures; IEEE Trans. Info. Theory, to appea
    • …
    corecore