115 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
A Survey on Quantum Reinforcement Learning
Quantum reinforcement learning is an emerging field at the intersection of
quantum computing and machine learning. While we intend to provide a broad
overview of the literature on quantum reinforcement learning (our
interpretation of this term will be clarified below), we put particular
emphasis on recent developments. With a focus on already available noisy
intermediate-scale quantum devices, these include variational quantum circuits
acting as function approximators in an otherwise classical reinforcement
learning setting. In addition, we survey quantum reinforcement learning
algorithms based on future fault-tolerant hardware, some of which come with a
provable quantum advantage. We provide both a birds-eye-view of the field, as
well as summaries and reviews for selected parts of the literature.Comment: 62 pages, 16 figure
Pessimistic Off-Policy Multi-Objective Optimization
Multi-objective optimization is a type of decision making problems where
multiple conflicting objectives are optimized. We study offline optimization of
multi-objective policies from data collected by an existing policy. We propose
a pessimistic estimator for the multi-objective policy values that can be
easily plugged into existing formulas for hypervolume computation and
optimized. The estimator is based on inverse propensity scores (IPS), and
improves upon a naive IPS estimator in both theory and experiments. Our
analysis is general, and applies beyond our IPS estimators and methods for
optimizing them. The pessimistic estimator can be optimized by policy gradients
and performs well in all of our experiments
When can dictionary learning uniquely recover sparse data from subsamples?
Sparse coding or sparse dictionary learning has been widely used to recover
underlying structure in many kinds of natural data. Here, we provide conditions
guaranteeing when this recovery is universal; that is, when sparse codes and
dictionaries are unique (up to natural symmetries). Our main tool is a useful
lemma in combinatorial matrix theory that allows us to derive bounds on the
sample sizes guaranteeing such uniqueness under various assumptions for how
training data are generated. Whenever the conditions to one of our theorems are
met, any sparsity-constrained learning algorithm that succeeds in
reconstructing the data recovers the original sparse codes and dictionary. We
also discuss potential applications to neuroscience and data analysis.Comment: 8 pages, 1 figures; IEEE Trans. Info. Theory, to appea
- …