124 research outputs found
A Geometric Proof of Calibration
We provide yet another proof of the existence of calibrated forecasters; it
has two merits. First, it is valid for an arbitrary finite number of outcomes.
Second, it is short and simple and it follows from a direct application of
Blackwell's approachability theorem to carefully chosen vector-valued payoff
function and convex target set. Our proof captures the essence of existing
proofs based on approachability (e.g., the proof by Foster, 1999 in case of
binary outcomes) and highlights the intrinsic connection between
approachability and calibration
Do countries falsify economic data strategically? Some evidence that they might.
Using Benford's Law, we find evidence supporting the hypothesis that countries at times misreport their economic data strategically. We group countries with similar economic conditions and find that for countries with fixed exchange rate regimes, high negative net foreign asset positions, negative current account balances or more vulnerable to capital flow reversals we reject the first-digit law for the balance of payments data. This corroborates the intuition of a simple economic model. The main results do not seem to be driven by countries in Sub-Saharan Africa or those with low institutional quality ratings.capital flows; public information provision; misinformation; Benford's Law; transparency
Robust approachability and regret minimization in games with partial monitoring
Approachability has become a standard tool in analyzing earning algorithms in
the adversarial online learning setup. We develop a variant of approachability
for games where there is ambiguity in the obtained reward that belongs to a
set, rather than being a single vector. Using this variant we tackle the
problem of approachability in games with partial monitoring and develop simple
and efficient algorithms (i.e., with constant per-step complexity) for this
setup. We finally consider external regret and internal regret in repeated
games with partial monitoring and derive regret-minimizing strategies based on
approachability theory
Improved Second-Order Bounds for Prediction with Expert Advice
This work studies external regret in sequential prediction games with both
positive and negative payoffs. External regret measures the difference between
the payoff obtained by the forecasting strategy and the payoff of the best
action. In this setting, we derive new and sharper regret bounds for the
well-known exponentially weighted average forecaster and for a new forecaster
with a different multiplicative update rule. Our analysis has two main
advantages: first, no preliminary knowledge about the payoff sequence is
needed, not even its range; second, our bounds are expressed in terms of sums
of squared payoffs, replacing larger first-order quantities appearing in
previous bounds. In addition, our most refined bounds have the natural and
desirable property of being stable under rescalings and general translations of
the payoff sequence
Pure Exploration for Multi-Armed Bandit Problems
We consider the framework of stochastic multi-armed bandit problems and study
the possibilities and limitations of forecasters that perform an on-line
exploration of the arms. These forecasters are assessed in terms of their
simple regret, a regret notion that captures the fact that exploration is only
constrained by the number of available rounds (not necessarily known in
advance), in contrast to the case when the cumulative regret is considered and
when exploitation needs to be performed at the same time. We believe that this
performance criterion is suited to situations when the cost of pulling an arm
is expressed in terms of resources rather than rewards. We discuss the links
between the simple and the cumulative regret. One of the main results in the
case of a finite number of arms is a general lower bound on the simple regret
of a forecaster in terms of its cumulative regret: the smaller the latter, the
larger the former. Keeping this result in mind, we then exhibit upper bounds on
the simple regret of some forecasters. The paper ends with a study devoted to
continuous-armed bandit problems; we show that the simple regret can be
minimized with respect to a family of probability distributions if and only if
the cumulative regret can be minimized for it. Based on this equivalence, we
are able to prove that the separable metric spaces are exactly the metric
spaces on which these regrets can be minimized with respect to the family of
all probability distributions with continuous mean-payoff functions
Strategies for prediction under imperfect monitoring
We propose simple randomized strategies for sequential prediction under
imperfect monitoring, that is, when the forecaster does not have access to the
past outcomes but rather to a feedback signal. The proposed strategies are
consistent in the sense that they achieve, asymptotically, the best possible
average reward. It was Rustichini (1999) who first proved the existence of such
consistent predictors. The forecasters presented here offer the first
constructive proof of consistency. Moreover, the proposed algorithms are
computationally efficient. We also establish upper bounds for the rates of
convergence. In the case of deterministic feedback, these rates are optimal up
to logarithmic terms.Comment: Journal version of a COLT conference pape
Approachability in unknown games: Online learning meets multi-objective optimization
In the standard setting of approachability there are two players and a target
set. The players play repeatedly a known vector-valued game where the first
player wants to have the average vector-valued payoff converge to the target
set which the other player tries to exclude it from this set. We revisit this
setting in the spirit of online learning and do not assume that the first
player knows the game structure: she receives an arbitrary vector-valued reward
vector at every round. She wishes to approach the smallest ("best") possible
set given the observed average payoffs in hindsight. This extension of the
standard setting has implications even when the original target set is not
approachable and when it is not obvious which expansion of it should be
approached instead. We show that it is impossible, in general, to approach the
best target set in hindsight and propose achievable though ambitious
alternative goals. We further propose a concrete strategy to approach these
goals. Our method does not require projection onto a target set and amounts to
switching between scalar regret minimization algorithms that are performed in
episodes. Applications to global cost minimization and to approachability under
sample path constraints are considered
Online Multi-task Learning with Hard Constraints
We discuss multi-task online learning when a decision maker has to deal
simultaneously with M tasks. The tasks are related, which is modeled by
imposing that the M-tuple of actions taken by the decision maker needs to
satisfy certain constraints. We give natural examples of such restrictions and
then discuss a general class of tractable constraints, for which we introduce
computationally efficient ways of selecting actions, essentially by reducing to
an on-line shortest path problem. We briefly discuss "tracking" and "bandit"
versions of the problem and extend the model in various ways, including
non-additive global losses and uncountably infinite sets of tasks
A Second-order Bound with Excess Losses
We study online aggregation of the predictions of experts, and first show new
second-order regret bounds in the standard setting, which are obtained via a
version of the Prod algorithm (and also a version of the polynomially weighted
average algorithm) with multiple learning rates. These bounds are in terms of
excess losses, the differences between the instantaneous losses suffered by the
algorithm and the ones of a given expert. We then demonstrate the interest of
these bounds in the context of experts that report their confidences as a
number in the interval [0,1] using a generic reduction to the standard setting.
We conclude by two other applications in the standard setting, which improve
the known bounds in case of small excess losses and show a bounded regret
against i.i.d. sequences of losses
Contextual Bandits with Knapsacks for a Conversion Model
We consider contextual bandits with knapsacks, with an underlying structure
between rewards generated and cost vectors suffered. We do so motivated by
sales with commercial discounts. At each round, given the stochastic i.i.d.\
context and the arm picked (corresponding, e.g., to a
discount level), a customer conversion may be obtained, in which case a reward
is gained and vector costs are
suffered (corresponding, e.g., to losses of earnings). Otherwise, in the
absence of a conversion, the reward and costs are null. The reward and costs
achieved are thus coupled through the binary variable measuring conversion or
the absence thereof. This underlying structure between rewards and costs is
different from the linear structures considered by Agrawal and Devanur [2016]
(but we show that the techniques introduced in the present article may also be
applied to the case of these linear structures). The adaptive policies
exhibited solve at each round a linear program based on upper-confidence
estimates of the probabilities of conversion given and . This
kind of policy is most natural and achieves a regret bound of the typical order
(OPT/) , where is the total budget allowed, OPT is the optimal
expected reward achievable by a static policy, and is the number of rounds.Comment: Thirty-sixth Conference on Neural Information Processing Systems,
2022, New Orleans, United State
- …