268 research outputs found
The Computational Power of Optimization in Online Learning
We consider the fundamental problem of prediction with expert advice where
the experts are "optimizable": there is a black-box optimization oracle that
can be used to compute, in constant time, the leading expert in retrospect at
any point in time. In this setting, we give a novel online algorithm that
attains vanishing regret with respect to experts in total
computation time. We also give a lower bound showing
that this running time cannot be improved (up to log factors) in the oracle
model, thereby exhibiting a quadratic speedup as compared to the standard,
oracle-free setting where the required time for vanishing regret is
. These results demonstrate an exponential gap between
the power of optimization in online learning and its power in statistical
learning: in the latter, an optimization oracle---i.e., an efficient empirical
risk minimizer---allows to learn a finite hypothesis class of size in time
. We also study the implications of our results to learning in
repeated zero-sum games, in a setting where the players have access to oracles
that compute, in constant time, their best-response to any mixed strategy of
their opponent. We show that the runtime required for approximating the minimax
value of the game in this setting is , yielding
again a quadratic improvement upon the oracle-free setting, where
is known to be tight
How to Price Shared Optimizations in the Cloud
Data-management-as-a-service systems are increasingly being used in
collaborative settings, where multiple users access common datasets. Cloud
providers have the choice to implement various optimizations, such as indexing
or materialized views, to accelerate queries over these datasets. Each
optimization carries a cost and may benefit multiple users. This creates a
major challenge: how to select which optimizations to perform and how to share
their cost among users. The problem is especially challenging when users are
selfish and will only report their true values for different optimizations if
doing so maximizes their utility. In this paper, we present a new approach for
selecting and pricing shared optimizations by using Mechanism Design. We first
show how to apply the Shapley Value Mechanism to the simple case of selecting
and pricing additive optimizations, assuming an offline game where all users
access the service for the same time-period. Second, we extend the approach to
online scenarios where users come and go. Finally, we consider the case of
substitutive optimizations. We show analytically that our mechanisms induce
truth- fulness and recover the optimization costs. We also show experimentally
that our mechanisms yield higher utility than the state-of-the-art approach
based on regret accumulation.Comment: VLDB201
No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions
Existing online learning algorithms for adversarial Markov Decision Processes
achieve regret after rounds of interactions even if the
loss functions are chosen arbitrarily by an adversary, with the caveat that the
transition function has to be fixed. This is because it has been shown that
adversarial transition functions make no-regret learning impossible. Despite
such impossibility results, in this work, we develop algorithms that can handle
both adversarial losses and adversarial transitions, with regret increasing
smoothly in the degree of maliciousness of the adversary. More concretely, we
first propose an algorithm that enjoys regret where measures how adversarial the
transition functions are and can be at most . While this algorithm
itself requires knowledge of , we further develop a black-box
reduction approach that removes this requirement. Moreover, we also show that
further refinements of the algorithm not only maintains the same regret bound,
but also simultaneously adapts to easier environments (where losses are
generated in a certain stochastically constrained manner as in Jin et al.
[2021]) and achieves regret, where is some standard gap-dependent coefficient
and is the amount of corruption on losses.Comment: 66 page
- …