Search CORE

17 research outputs found

Adaptation to Easy Data in Prediction with Limited Advice

Author: Seldin Yevgeny
Thune Tobias Sommer
Publication venue
Publication date: 01/01/2018
Field of study

We derive an online learning algorithm with improved regret guarantees for `easy' loss sequences. We consider two types of `easiness': (a) stochastic loss sequences and (b) adversarial loss sequences with small effective range of the losses. While a number of algorithms have been proposed for exploiting small effective range in the full information setting, Gerchinovitz and Lattimore [2016] have shown the impossibility of regret scaling with the effective range of the losses in the bandit setting. We show that just one additional observation per round is sufficient to circumvent the impossibility result. The proposed Second Order Difference Adjustments (SODA) algorithm requires no prior knowledge of the effective range of the losses,

\varepsilon

, and achieves an

O(\varepsilon \sqrt{KT \ln K}) + \tilde{O}(\varepsilon K \sqrt[4]{T})

expected regret guarantee, where

T

is the time horizon and

K

is the number of actions. The scaling with the effective loss range is achieved under significantly weaker assumptions than those made by Cesa-Bianchi and Shamir [2018] in an earlier attempt to circumvent the impossibility result. We also provide a regret lower bound of

\Omega(\varepsilon\sqrt{T K})

, which almost matches the upper bound. In addition, we show that in the stochastic setting SODA achieves an

O\left(\sum_{a:\Delta_a>0} \frac{K^3 \varepsilon^2}{\Delta_a}\right)

pseudo-regret bound that holds simultaneously with the adversarial regret guarantee. In other words, SODA is safe against an unrestricted oblivious adversary and provides improved regret guarantees for at least two different types of `easiness' simultaneously.Comment: Fixed a mistake in the proof and statement of Theorem

arXiv.org e-Print Archive

Copenhagen University Research Information System

On Adaptivity in Information-constrained Online Learning

Author: Gopalan Aditya
Mitra Siddharth
Publication venue
Publication date: 06/12/2019
Field of study

We study how to adapt to smoothly-varying ('easy') environments in well-known online learning problems where acquiring information is expensive. For the problem of label efficient prediction, which is a budgeted version of prediction with expert advice, we present an online algorithm whose regret depends optimally on the number of labels allowed and

Q^*

(the quadratic variation of the losses of the best action in hindsight), along with a parameter-free counterpart whose regret depends optimally on

Q

(the quadratic variation of the losses of all the actions). These quantities can be significantly smaller than

T

(the total time horizon), yielding an improvement over existing, variation-independent results for the problem. We then extend our analysis to handle label efficient prediction with bandit feedback, i.e., label efficient bandits. Our work builds upon the framework of optimistic online mirror descent, and leverages second order corrections along with a carefully designed hybrid regularizer that encodes the constrained information structure of the problem. We then consider revealing action-partial monitoring games -- a version of label efficient prediction with additive information costs, which in general are known to lie in the \textit{hard} class of games having minimax regret of order

T^{\frac{2}{3}}

. We provide a strategy with an

\mathcal{O}((Q^*T)^{\frac{1}{3}})

bound for revealing action games, along with one with a

\mathcal{O}((QT)^{\frac{1}{3}})

bound for the full class of hard partial monitoring games, both being strict improvements over current bounds.Comment: 34th AAAI Conference on Artificial Intelligence (AAAI 2020). Short version at 11th Optimization for Machine Learning workshop (OPT 2019

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Robust Bandit Learning with Imperfect Context

Author: Ren Shaolei
Yang Jianyi
Publication venue
Publication date: 04/04/2021
Field of study

A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes on, MaxMinUCB and MinWD both perform as asymptotically well as their optimal counterparts that know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Quantum circuit architecture search: error mitigation and trainability enhancement for variational quantum solvers

Author: Du Yuxuan
Hsieh Min-Hsiu
Huang Tao
Tao Dacheng
You Shan
Publication venue
Publication date: 12/11/2020
Field of study

Quantum error mitigation techniques are at the heart of quantum hardware implementation, and are the key to performance improvement of the variational quantum learning scheme (VQLS). Although VQLS is partially robust to noise, both empirical and theoretical results exhibit that noise would rapidly deteriorate the performance of most variational quantum algorithms in large-scale problems. Furthermore, VQLS suffers from the barren plateau phenomenon---the gradient generated by the classical optimizer vanishes exponentially with respect to the qubit number. Here we devise a resource and runtime efficient scheme, the quantum architecture search scheme (QAS), to maximally improve the robustness and trainability of VQLS. In particular, given a learning task, QAS actively seeks an optimal circuit architecture to balance benefits and side-effects brought by adding more quantum gates. Specifically, while more quantum gates enable a stronger expressive power of the quantum model, they introduce a larger amount of noise and a more serious barren plateau scenario. Consequently, QAS can effectively suppress the influence of quantum noise and barren plateaus. We implement QAS on both the numerical simulator and real quantum hardware, via the IBM cloud, to accomplish data classification and quantum chemistry tasks. Numerical and experimental results show that QAS significantly outperforms conventional variational quantum algorithms with heuristic circuit architectures. Our work provides practical guidance for developing advanced learning-based quantum error mitigation techniques on near-term quantum devices.Comment: 8+9 pages. See also a concurrent paper that appeared yesterday [arXiv:2010.08561

arXiv.org e-Print Archive