17 research outputs found

    Adaptation to Easy Data in Prediction with Limited Advice

    Full text link
    We derive an online learning algorithm with improved regret guarantees for `easy' loss sequences. We consider two types of `easiness': (a) stochastic loss sequences and (b) adversarial loss sequences with small effective range of the losses. While a number of algorithms have been proposed for exploiting small effective range in the full information setting, Gerchinovitz and Lattimore [2016] have shown the impossibility of regret scaling with the effective range of the losses in the bandit setting. We show that just one additional observation per round is sufficient to circumvent the impossibility result. The proposed Second Order Difference Adjustments (SODA) algorithm requires no prior knowledge of the effective range of the losses, ε\varepsilon, and achieves an O(εKTlnK)+O~(εKT4)O(\varepsilon \sqrt{KT \ln K}) + \tilde{O}(\varepsilon K \sqrt[4]{T}) expected regret guarantee, where TT is the time horizon and KK is the number of actions. The scaling with the effective loss range is achieved under significantly weaker assumptions than those made by Cesa-Bianchi and Shamir [2018] in an earlier attempt to circumvent the impossibility result. We also provide a regret lower bound of Ω(εTK)\Omega(\varepsilon\sqrt{T K}), which almost matches the upper bound. In addition, we show that in the stochastic setting SODA achieves an O(a:Δa>0K3ε2Δa)O\left(\sum_{a:\Delta_a>0} \frac{K^3 \varepsilon^2}{\Delta_a}\right) pseudo-regret bound that holds simultaneously with the adversarial regret guarantee. In other words, SODA is safe against an unrestricted oblivious adversary and provides improved regret guarantees for at least two different types of `easiness' simultaneously.Comment: Fixed a mistake in the proof and statement of Theorem

    On Adaptivity in Information-constrained Online Learning

    Full text link
    We study how to adapt to smoothly-varying ('easy') environments in well-known online learning problems where acquiring information is expensive. For the problem of label efficient prediction, which is a budgeted version of prediction with expert advice, we present an online algorithm whose regret depends optimally on the number of labels allowed and QQ^* (the quadratic variation of the losses of the best action in hindsight), along with a parameter-free counterpart whose regret depends optimally on QQ (the quadratic variation of the losses of all the actions). These quantities can be significantly smaller than TT (the total time horizon), yielding an improvement over existing, variation-independent results for the problem. We then extend our analysis to handle label efficient prediction with bandit feedback, i.e., label efficient bandits. Our work builds upon the framework of optimistic online mirror descent, and leverages second order corrections along with a carefully designed hybrid regularizer that encodes the constrained information structure of the problem. We then consider revealing action-partial monitoring games -- a version of label efficient prediction with additive information costs, which in general are known to lie in the \textit{hard} class of games having minimax regret of order T23T^{\frac{2}{3}}. We provide a strategy with an O((QT)13)\mathcal{O}((Q^*T)^{\frac{1}{3}}) bound for revealing action games, along with one with a O((QT)13)\mathcal{O}((QT)^{\frac{1}{3}}) bound for the full class of hard partial monitoring games, both being strict improvements over current bounds.Comment: 34th AAAI Conference on Artificial Intelligence (AAAI 2020). Short version at 11th Optimization for Machine Learning workshop (OPT 2019

    Robust Bandit Learning with Imperfect Context

    Full text link
    A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes on, MaxMinUCB and MinWD both perform as asymptotically well as their optimal counterparts that know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis

    Quantum circuit architecture search: error mitigation and trainability enhancement for variational quantum solvers

    Full text link
    Quantum error mitigation techniques are at the heart of quantum hardware implementation, and are the key to performance improvement of the variational quantum learning scheme (VQLS). Although VQLS is partially robust to noise, both empirical and theoretical results exhibit that noise would rapidly deteriorate the performance of most variational quantum algorithms in large-scale problems. Furthermore, VQLS suffers from the barren plateau phenomenon---the gradient generated by the classical optimizer vanishes exponentially with respect to the qubit number. Here we devise a resource and runtime efficient scheme, the quantum architecture search scheme (QAS), to maximally improve the robustness and trainability of VQLS. In particular, given a learning task, QAS actively seeks an optimal circuit architecture to balance benefits and side-effects brought by adding more quantum gates. Specifically, while more quantum gates enable a stronger expressive power of the quantum model, they introduce a larger amount of noise and a more serious barren plateau scenario. Consequently, QAS can effectively suppress the influence of quantum noise and barren plateaus. We implement QAS on both the numerical simulator and real quantum hardware, via the IBM cloud, to accomplish data classification and quantum chemistry tasks. Numerical and experimental results show that QAS significantly outperforms conventional variational quantum algorithms with heuristic circuit architectures. Our work provides practical guidance for developing advanced learning-based quantum error mitigation techniques on near-term quantum devices.Comment: 8+9 pages. See also a concurrent paper that appeared yesterday [arXiv:2010.08561
    corecore