268 research outputs found

    The Computational Power of Optimization in Online Learning

    Full text link
    We consider the fundamental problem of prediction with expert advice where the experts are "optimizable": there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to NN experts in total O~(N)\widetilde{O}(\sqrt{N}) computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is Θ~(N)\widetilde{\Theta}(N). These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle---i.e., an efficient empirical risk minimizer---allows to learn a finite hypothesis class of size NN in time O(logN)O(\log{N}). We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is Θ~(N)\widetilde{\Theta}(\sqrt{N}), yielding again a quadratic improvement upon the oracle-free setting, where Θ~(N)\widetilde{\Theta}(N) is known to be tight

    How to Price Shared Optimizations in the Cloud

    Full text link
    Data-management-as-a-service systems are increasingly being used in collaborative settings, where multiple users access common datasets. Cloud providers have the choice to implement various optimizations, such as indexing or materialized views, to accelerate queries over these datasets. Each optimization carries a cost and may benefit multiple users. This creates a major challenge: how to select which optimizations to perform and how to share their cost among users. The problem is especially challenging when users are selfish and will only report their true values for different optimizations if doing so maximizes their utility. In this paper, we present a new approach for selecting and pricing shared optimizations by using Mechanism Design. We first show how to apply the Shapley Value Mechanism to the simple case of selecting and pricing additive optimizations, assuming an offline game where all users access the service for the same time-period. Second, we extend the approach to online scenarios where users come and go. Finally, we consider the case of substitutive optimizations. We show analytically that our mechanisms induce truth- fulness and recover the optimization costs. We also show experimentally that our mechanisms yield higher utility than the state-of-the-art approach based on regret accumulation.Comment: VLDB201

    No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

    Full text link
    Existing online learning algorithms for adversarial Markov Decision Processes achieve O(T){O}(\sqrt{T}) regret after TT rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys O~(T+CP)\widetilde{{O}}(\sqrt{T} + C^{\textsf{P}}) regret where CPC^{\textsf{P}} measures how adversarial the transition functions are and can be at most O(T){O}(T). While this algorithm itself requires knowledge of CPC^{\textsf{P}}, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves O~(U+UCL+CP)\widetilde{{O}}(U + \sqrt{UC^{\textsf{L}}} + C^{\textsf{P}}) regret, where UU is some standard gap-dependent coefficient and CLC^{\textsf{L}} is the amount of corruption on losses.Comment: 66 page
    corecore