17 research outputs found
Adaptation to Easy Data in Prediction with Limited Advice
We derive an online learning algorithm with improved regret guarantees for
`easy' loss sequences. We consider two types of `easiness': (a) stochastic loss
sequences and (b) adversarial loss sequences with small effective range of the
losses. While a number of algorithms have been proposed for exploiting small
effective range in the full information setting, Gerchinovitz and Lattimore
[2016] have shown the impossibility of regret scaling with the effective range
of the losses in the bandit setting. We show that just one additional
observation per round is sufficient to circumvent the impossibility result. The
proposed Second Order Difference Adjustments (SODA) algorithm requires no prior
knowledge of the effective range of the losses, , and achieves an
expected regret guarantee, where is the time horizon and is the number
of actions. The scaling with the effective loss range is achieved under
significantly weaker assumptions than those made by Cesa-Bianchi and Shamir
[2018] in an earlier attempt to circumvent the impossibility result. We also
provide a regret lower bound of , which almost
matches the upper bound. In addition, we show that in the stochastic setting
SODA achieves an pseudo-regret bound that holds simultaneously
with the adversarial regret guarantee. In other words, SODA is safe against an
unrestricted oblivious adversary and provides improved regret guarantees for at
least two different types of `easiness' simultaneously.Comment: Fixed a mistake in the proof and statement of Theorem
On Adaptivity in Information-constrained Online Learning
We study how to adapt to smoothly-varying ('easy') environments in well-known
online learning problems where acquiring information is expensive. For the
problem of label efficient prediction, which is a budgeted version of
prediction with expert advice, we present an online algorithm whose regret
depends optimally on the number of labels allowed and (the quadratic
variation of the losses of the best action in hindsight), along with a
parameter-free counterpart whose regret depends optimally on (the quadratic
variation of the losses of all the actions). These quantities can be
significantly smaller than (the total time horizon), yielding an
improvement over existing, variation-independent results for the problem. We
then extend our analysis to handle label efficient prediction with bandit
feedback, i.e., label efficient bandits. Our work builds upon the framework of
optimistic online mirror descent, and leverages second order corrections along
with a carefully designed hybrid regularizer that encodes the constrained
information structure of the problem. We then consider revealing action-partial
monitoring games -- a version of label efficient prediction with additive
information costs, which in general are known to lie in the \textit{hard} class
of games having minimax regret of order . We provide a
strategy with an bound for revealing action
games, along with one with a bound for the
full class of hard partial monitoring games, both being strict improvements
over current bounds.Comment: 34th AAAI Conference on Artificial Intelligence (AAAI 2020). Short
version at 11th Optimization for Machine Learning workshop (OPT 2019
Robust Bandit Learning with Imperfect Context
A standard assumption in contextual multi-arm bandit is that the true context
is perfectly known before arm selection. Nonetheless, in many practical
applications (e.g., cloud resource management), prior to arm selection, the
context information can only be acquired by prediction subject to errors or
adversarial modification. In this paper, we study a contextual bandit setting
in which only imperfect context is available for arm selection while the true
context is revealed at the end of each round. We propose two robust arm
selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the
worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes
the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and
MinWD by deriving both regret and reward bounds compared to an oracle that
knows the true context. Our results show that as time goes on, MaxMinUCB and
MinWD both perform as asymptotically well as their optimal counterparts that
know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge
datacenter selection, and run synthetic simulations to validate our theoretical
analysis
Quantum circuit architecture search: error mitigation and trainability enhancement for variational quantum solvers
Quantum error mitigation techniques are at the heart of quantum hardware
implementation, and are the key to performance improvement of the variational
quantum learning scheme (VQLS). Although VQLS is partially robust to noise,
both empirical and theoretical results exhibit that noise would rapidly
deteriorate the performance of most variational quantum algorithms in
large-scale problems. Furthermore, VQLS suffers from the barren plateau
phenomenon---the gradient generated by the classical optimizer vanishes
exponentially with respect to the qubit number. Here we devise a resource and
runtime efficient scheme, the quantum architecture search scheme (QAS), to
maximally improve the robustness and trainability of VQLS. In particular, given
a learning task, QAS actively seeks an optimal circuit architecture to balance
benefits and side-effects brought by adding more quantum gates. Specifically,
while more quantum gates enable a stronger expressive power of the quantum
model, they introduce a larger amount of noise and a more serious barren
plateau scenario. Consequently, QAS can effectively suppress the influence of
quantum noise and barren plateaus. We implement QAS on both the numerical
simulator and real quantum hardware, via the IBM cloud, to accomplish data
classification and quantum chemistry tasks. Numerical and experimental results
show that QAS significantly outperforms conventional variational quantum
algorithms with heuristic circuit architectures. Our work provides practical
guidance for developing advanced learning-based quantum error mitigation
techniques on near-term quantum devices.Comment: 8+9 pages. See also a concurrent paper that appeared yesterday
[arXiv:2010.08561