23,929 research outputs found
Second-order Quantile Methods for Experts and Combinatorial Games
We aim to design strategies for sequential decision making that adjust to the
difficulty of the learning problem. We study this question both in the setting
of prediction with expert advice, and for more general combinatorial decision
tasks. We are not satisfied with just guaranteeing minimax regret rates, but we
want our algorithms to perform significantly better on easy data. Two popular
ways to formalize such adaptivity are second-order regret bounds and quantile
bounds. The underlying notions of 'easy data', which may be paraphrased as "the
learning problem has small variance" and "multiple decisions are useful", are
synergetic. But even though there are sophisticated algorithms that exploit one
of the two, no existing algorithm is able to adapt to both.
In this paper we outline a new method for obtaining such adaptive algorithms,
based on a potential function that aggregates a range of learning rates (which
are essential tuning parameters). By choosing the right prior we construct
efficient algorithms and show that they reap both benefits by proving the first
bounds that are both second-order and incorporate quantiles
An efficient algorithm for learning with semi-bandit feedback
We consider the problem of online combinatorial optimization under
semi-bandit feedback. The goal of the learner is to sequentially select its
actions from a combinatorial decision set so as to minimize its cumulative
loss. We propose a learning algorithm for this problem based on combining the
Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss
estimation procedure called Geometric Resampling (GR). Contrary to previous
solutions, the resulting algorithm can be efficiently implemented for any
decision set where efficient offline combinatorial optimization is possible at
all. Assuming that the elements of the decision set can be described with
d-dimensional binary vectors with at most m non-zero entries, we show that the
expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a
side result, we also improve the best known regret bounds for FPL in the full
information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m)
over previous bounds for this algorithm.Comment: submitted to ALT 201
First-order regret bounds for combinatorial semi-bandits
We consider the problem of online combinatorial optimization under
semi-bandit feedback, where a learner has to repeatedly pick actions from a
combinatorial decision set in order to minimize the total losses associated
with its decisions. After making each decision, the learner observes the losses
associated with its action, but not other losses. For this problem, there are
several learning algorithms that guarantee that the learner's expected regret
grows as with the number of rounds . In this
paper, we propose an algorithm that improves this scaling to
, where is the total loss of the best
action. Our algorithm is among the first to achieve such guarantees in a
partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201
High-Dimensional Prediction for Sequential Decision Making
We study the problem of making predictions of an adversarially chosen
high-dimensional state that are unbiased subject to an arbitrary collection of
conditioning events, with the goal of tailoring these events to downstream
decision makers. We give efficient algorithms for solving this problem, as well
as a number of applications that stem from choosing an appropriate set of
conditioning events.
For example, we can efficiently make predictions targeted at polynomially
many decision makers, giving each of them optimal swap regret if they
best-respond to our predictions. We generalize this to online combinatorial
optimization, where the decision makers have a very large action space, to give
the first algorithms offering polynomially many decision makers no regret on
polynomially many subsequences that may depend on their actions and the
context. We apply these results to get efficient no-subsequence-regret
algorithms in extensive-form games (EFGs), yielding a new family of regret
guarantees for EFGs that generalizes some existing EFG regret notions, e.g.
regret to informed causal deviations, and is generally incomparable to other
known such notions.
Next, we develop a novel transparent alternative to conformal prediction for
building valid online adversarial multiclass prediction sets. We produce class
scores that downstream algorithms can use for producing valid-coverage
prediction sets, as if these scores were the true conditional class
probabilities. We show this implies strong conditional validity guarantees
including set-size-conditional and multigroup-fair coverage for polynomially
many downstream prediction sets. Moreover, our class scores can be guaranteed
to have improved loss, cross-entropy loss, and generally any Bregman
loss, compared to any collection of benchmark models, yielding a
high-dimensional real-valued version of omniprediction.Comment: Added references, Arxiv abstract edite
- …