Search CORE

268 research outputs found

The Computational Power of Optimization in Online Learning

Author: Agarwal A.
Agarwal A.
Dani V.
Dud´ık M.
Gofer E.
Hazan E.
Kakade S.
McMahan H. B.
Shalev-Shwartz S.
Zinkevich M.
Publication venue
Publication date: 27/01/2016
Field of study

We consider the fundamental problem of prediction with expert advice where the experts are "optimizable": there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to

N

experts in total

\widetilde{O}(\sqrt{N})

computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is

\widetilde{\Theta}(N)

. These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle---i.e., an efficient empirical risk minimizer---allows to learn a finite hypothesis class of size

N

in time

O(\log{N})

. We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is

\widetilde{\Theta}(\sqrt{N})

, yielding again a quadratic improvement upon the oracle-free setting, where

\widetilde{\Theta}(N)

is known to be tight

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

How to Price Shared Optimizations in the Cloud

Author: Balazinska Magdalena
Suciu Dan
Upadhyaya Prasang
Publication venue
Publication date: 01/01/2011
Field of study

Data-management-as-a-service systems are increasingly being used in collaborative settings, where multiple users access common datasets. Cloud providers have the choice to implement various optimizations, such as indexing or materialized views, to accelerate queries over these datasets. Each optimization carries a cost and may benefit multiple users. This creates a major challenge: how to select which optimizations to perform and how to share their cost among users. The problem is especially challenging when users are selfish and will only report their true values for different optimizations if doing so maximizes their utility. In this paper, we present a new approach for selecting and pricing shared optimizations by using Mechanism Design. We first show how to apply the Shapley Value Mechanism to the simple case of selecting and pricing additive optimizations, assuming an offline game where all users access the service for the same time-period. Second, we extend the approach to online scenarios where users come and go. Finally, we consider the case of substitutive optimizations. We show analytically that our mechanisms induce truth- fulness and recover the optimization costs. We also show experimentally that our mechanisms yield higher utility than the state-of-the-art approach based on regret accumulation.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Author: Chang William
Jin Tiancheng
Liu Junyan
Luo Haipeng
Rouyer Chloé
Wei Chen-Yu
Publication venue
Publication date: 30/05/2023
Field of study

Existing online learning algorithms for adversarial Markov Decision Processes achieve

{O}(\sqrt{T})

regret after

T

rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys

\widetilde{{O}}(\sqrt{T} + C^{\textsf{P}})

regret where

C^{\textsf{P}}

measures how adversarial the transition functions are and can be at most

{O}(T)

. While this algorithm itself requires knowledge of

C^{\textsf{P}}

, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves

\widetilde{{O}}(U + \sqrt{UC^{\textsf{L}}} + C^{\textsf{P}})

regret, where

U

is some standard gap-dependent coefficient and

C^{\textsf{L}}

is the amount of corruption on losses.Comment: 66 page

arXiv.org e-Print Archive