160 research outputs found
Calibration and Internal no-Regret with Partial Monitoring
Calibrated strategies can be obtained by performing strategies that have no
internal regret in some auxiliary game. Such strategies can be constructed
explicitly with the use of Blackwell's approachability theorem, in an other
auxiliary game. We establish the converse: a strategy that approaches a convex
-set can be derived from the construction of a calibrated strategy. We
develop these tools in the framework of a game with partial monitoring, where
players do not observe the actions of their opponents but receive random
signals, to define a notion of internal regret and construct strategies that
have no such regret
Approachability of Convex Sets in Games with Partial Monitoring
We provide a necessary and sufficient condition under which a convex set is
approachable in a game with partial monitoring, i.e.\ where players do not
observe their opponents' moves but receive random signals. This condition is an
extension of Blackwell's Criterion in the full monitoring framework, where
players observe at least their payoffs. When our condition is fulfilled, we
construct explicitly an approachability strategy, derived from a strategy
satisfying some internal consistency property in an auxiliary game. We also
provide an example of a convex set, that is neither (weakly)-approachable nor
(weakly)-excludable, a situation that cannot occur in the full monitoring case.
We finally apply our result to describe an -optimal strategy of the
uninformed player in a zero-sum repeated game with incomplete information on
one side
Highly-Smooth Zero-th Order Online Optimization Vianney Perchet
The minimization of convex functions which are only available through partial
and noisy information is a key methodological problem in many disciplines. In
this paper we consider convex optimization with noisy zero-th order
information, that is noisy function evaluations at any desired point. We focus
on problems with high degrees of smoothness, such as logistic regression. We
show that as opposed to gradient-based algorithms, high-order smoothness may be
used to improve estimation rates, with a precise dependence of our upper-bounds
on the degree of smoothness. In particular, we show that for infinitely
differentiable functions, we recover the same dependence on sample size as
gradient-based algorithms, with an extra dimension-dependent factor. This is
done for both convex and strongly-convex functions, with finite horizon and
anytime algorithms. Finally, we also recover similar results in the online
optimization setting.Comment: Conference on Learning Theory (COLT), Jun 2016, New York, United
States. 201
On an unified framework for approachability in games with or without signals
We unify standard frameworks for approachability both in full or partial
monitoring by defining a new abstract game, called the "purely informative
game", where the outcome at each stage is the maximal information players can
obtain, represented as some probability measure. Objectives of players can be
rewritten as the convergence (to some given set) of sequences of averages of
these probability measures. We obtain new results extending the approachability
theory developed by Blackwell moreover this new abstract framework enables us
to characterize approachable sets with, as usual, a remarkably simple and clear
reformulation for convex sets. Translated into the original games, those
results become the first necessary and sufficient condition under which an
arbitrary set is approachable and they cover and extend previous known results
for convex sets. We also investigate a specific class of games where, thanks to
some unusual definition of averages and convexity, we again obtain a complete
characterization of approachable sets along with rates of convergence
Gains and Losses are Fundamentally Different in Regret Minimization: The Sparse Case
We demonstrate that, in the classical non-stochastic regret minimization
problem with decisions, gains and losses to be respectively maximized or
minimized are fundamentally different. Indeed, by considering the additional
sparsity assumption (at each stage, at most decisions incur a nonzero
outcome), we derive optimal regret bounds of different orders. Specifically,
with gains, we obtain an optimal regret guarantee after stages of order
, so the classical dependency in the dimension is replaced by
the sparsity size. With losses, we provide matching upper and lower bounds of
order , which is decreasing in . Eventually, we also
study the bandit setting, and obtain an upper bound of order when outcomes are losses. This bound is proven to be optimal up to the
logarithmic factor
Sparse Stochastic Bandits
In the classical multi-armed bandit problem, d arms are available to the
decision maker who pulls them sequentially in order to maximize his cumulative
reward. Guarantees can be obtained on a relative quantity called regret, which
scales linearly with d (or with sqrt(d) in the minimax sense). We here consider
the sparse case of this classical problem in the sense that only a small number
of arms, namely s < d, have a positive expected reward. We are able to leverage
this additional assumption to provide an algorithm whose regret scales with s
instead of d. Moreover, we prove that this algorithm is optimal by providing a
matching lower bound - at least for a wide and pertinent range of parameters
that we determine - and by evaluating its performance on simulated data
- âŠ