1,756 research outputs found

    Quantum Bandits

    Full text link
    We consider the quantum version of the bandit problem known as {\em best arm identification} (BAI). We first propose a quantum modeling of the BAI problem, which assumes that both the learning agent and the environment are quantum; we then propose an algorithm based on quantum amplitude amplification to solve BAI. We formally analyze the behavior of the algorithm on all instances of the problem and we show, in particular, that it is able to get the optimal solution quadratically faster than what is known to hold in the classical case.Comment: All your comments are very welcome

    Generalized models as a universal approach to the analysis of nonlinear dynamical systems

    Full text link
    We present a universal approach to the investigation of the dynamics in generalized models. In these models the processes that are taken into account are not restricted to specific functional forms. Therefore a single generalized models can describe a class of systems which share a similar structure. Despite this generality, the proposed approach allows us to study the dynamical properties of generalized models efficiently in the framework of local bifurcation theory. The approach is based on a normalization procedure that is used to identify natural parameters of the system. The Jacobian in a steady state is then derived as a function of these parameters. The analytical computation of local bifurcations using computer algebra reveals conditions for the local asymptotic stability of steady states and provides certain insights on the global dynamics of the system. The proposed approach yields a close connection between modelling and nonlinear dynamics. We illustrate the investigation of generalized models by considering examples from three different disciplines of science: a socio-economic model of dynastic cycles in china, a model for a coupled laser system and a general ecological food web.Comment: 15 pages, 2 figures, (Fig. 2 in color

    Multi armed bandits and quantum channel oracles

    Full text link
    Multi armed bandits are one of the theoretical pillars of reinforcement learning. Recently, the investigation of quantum algorithms for multi armed bandit problems was started, and it was found that a quadratic speed-up is possible when the arms and the randomness of the rewards of the arms can be queried in superposition. Here we introduce further bandit models where we only have limited access to the randomness of the rewards, but we can still query the arms in superposition. We show that this impedes any speed-up of quantum algorithms.Comment: 44 page

    Quantum exploration algorithms for multi-armed bandits

    Full text link
    Identifying the best arm of a multi-armed bandit is a central problem in bandit optimization. We study a quantum computational version of this problem with coherent oracle access to states encoding the reward probabilities of each arm as quantum amplitudes. Specifically, we show that we can find the best arm with fixed confidence using O~(∑i=2nΔi−2)\tilde{O}\bigl(\sqrt{\sum_{i=2}^n\Delta^{\smash{-2}}_i}\bigr) quantum queries, where Δi\Delta_{i} represents the difference between the mean reward of the best arm and the ithi^\text{th}-best arm. This algorithm, based on variable-time amplitude amplification and estimation, gives a quadratic speedup compared to the best possible classical result. We also prove a matching quantum lower bound (up to poly-logarithmic factors).Comment: 18 pages, 1 figure. To appear in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021

    Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement Learning

    Full text link
    Reinforcement learning is a growing field in AI with a lot of potential. Intelligent behavior is learned automatically through trial and error in interaction with the environment. However, this learning process is often costly. Using variational quantum circuits as function approximators can reduce this cost. In order to implement this, we propose the quantum natural policy gradient (QNPG) algorithm -- a second-order gradient-based routine that takes advantage of an efficient approximation of the quantum Fisher information matrix. We experimentally demonstrate that QNPG outperforms first-order based training on Contextual Bandits environments regarding convergence speed and stability and thereby reduces the sample complexity. Furthermore, we provide evidence for the practical feasibility of our approach by training on a 12-qubit hardware device.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. 7 pages, 5 figures, 1 tabl
    • …
    corecore