1,756 research outputs found
Quantum Bandits
We consider the quantum version of the bandit problem known as {\em best arm
identification} (BAI). We first propose a quantum modeling of the BAI problem,
which assumes that both the learning agent and the environment are quantum; we
then propose an algorithm based on quantum amplitude amplification to solve
BAI. We formally analyze the behavior of the algorithm on all instances of the
problem and we show, in particular, that it is able to get the optimal solution
quadratically faster than what is known to hold in the classical case.Comment: All your comments are very welcome
Generalized models as a universal approach to the analysis of nonlinear dynamical systems
We present a universal approach to the investigation of the dynamics in
generalized models. In these models the processes that are taken into account
are not restricted to specific functional forms. Therefore a single generalized
models can describe a class of systems which share a similar structure. Despite
this generality, the proposed approach allows us to study the dynamical
properties of generalized models efficiently in the framework of local
bifurcation theory. The approach is based on a normalization procedure that is
used to identify natural parameters of the system. The Jacobian in a steady
state is then derived as a function of these parameters. The analytical
computation of local bifurcations using computer algebra reveals conditions for
the local asymptotic stability of steady states and provides certain insights
on the global dynamics of the system. The proposed approach yields a close
connection between modelling and nonlinear dynamics. We illustrate the
investigation of generalized models by considering examples from three
different disciplines of science: a socio-economic model of dynastic cycles in
china, a model for a coupled laser system and a general ecological food web.Comment: 15 pages, 2 figures, (Fig. 2 in color
Multi armed bandits and quantum channel oracles
Multi armed bandits are one of the theoretical pillars of reinforcement
learning. Recently, the investigation of quantum algorithms for multi armed
bandit problems was started, and it was found that a quadratic speed-up is
possible when the arms and the randomness of the rewards of the arms can be
queried in superposition. Here we introduce further bandit models where we only
have limited access to the randomness of the rewards, but we can still query
the arms in superposition. We show that this impedes any speed-up of quantum
algorithms.Comment: 44 page
Quantum exploration algorithms for multi-armed bandits
Identifying the best arm of a multi-armed bandit is a central problem in
bandit optimization. We study a quantum computational version of this problem
with coherent oracle access to states encoding the reward probabilities of each
arm as quantum amplitudes. Specifically, we show that we can find the best arm
with fixed confidence using
quantum
queries, where represents the difference between the mean reward
of the best arm and the -best arm. This algorithm, based on
variable-time amplitude amplification and estimation, gives a quadratic speedup
compared to the best possible classical result. We also prove a matching
quantum lower bound (up to poly-logarithmic factors).Comment: 18 pages, 1 figure. To appear in the Thirty-Fifth AAAI Conference on
Artificial Intelligence (AAAI 2021
Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement Learning
Reinforcement learning is a growing field in AI with a lot of potential.
Intelligent behavior is learned automatically through trial and error in
interaction with the environment. However, this learning process is often
costly. Using variational quantum circuits as function approximators can reduce
this cost. In order to implement this, we propose the quantum natural policy
gradient (QNPG) algorithm -- a second-order gradient-based routine that takes
advantage of an efficient approximation of the quantum Fisher information
matrix. We experimentally demonstrate that QNPG outperforms first-order based
training on Contextual Bandits environments regarding convergence speed and
stability and thereby reduces the sample complexity. Furthermore, we provide
evidence for the practical feasibility of our approach by training on a
12-qubit hardware device.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. 7 pages, 5 figures, 1 tabl
- …