19 research outputs found
Robust approachability and regret minimization in games with partial monitoring
Approachability has become a standard tool in analyzing earning algorithms in
the adversarial online learning setup. We develop a variant of approachability
for games where there is ambiguity in the obtained reward that belongs to a
set, rather than being a single vector. Using this variant we tackle the
problem of approachability in games with partial monitoring and develop simple
and efficient algorithms (i.e., with constant per-step complexity) for this
setup. We finally consider external regret and internal regret in repeated
games with partial monitoring and derive regret-minimizing strategies based on
approachability theory
Robust approachability and regret minimization in games with partial monitoring
Approachability has become a standard tool in analyzing earning algorithms in the adversarial online learning setup. We develop a variant of approachability for games where there is ambiguity in the obtained reward that belongs to a set, rather than being a single vector. Using this variant we tackle the problem of approachability in games with partial monitoring and develop simple and efficient algorithms (i.e., with constant per-step complexity) for this setup. We finally consider external regret and internal regret in repeated games with partial monitoring and derive regret-minimizing strategies based on approachability theory
Stochastic bandits with vector losses: Minimizing -norm of relative losses
Multi-armed bandits are widely applied in scenarios like recommender systems, for which the goal is to maximize the click rate. However, more factors should be considered, e.g., user stickiness, user growth rate, user experience assessment, etc. In this paper, we model this situation as a problem of K-armed bandit with multiple losses. We define relative loss vector of an arm where the i-th entry compares the arm and the optimal arm with respect to the i-th loss. We study two goals: (a) finding the arm with the minimum -norm of relative losses with a given confidence level (which refers to fixed-confidence best-arm identification); (b) minimizing the -norm of cumulative relative losses (which refers to regret minimization). For goal (a), we derive a problem-dependent sample complexity lower bound and discuss how to achieve matching algorithms. For goal (b), we provide a regret lower bound of Ω(T 2/3) and provide a matching algorithm
Approachability of convex sets in generalized quitting games
We examine Blackwell approachability in so-called generalized quitting games. These are repeated games in which each player may have quitting actions that terminate the game. We provide three simple geometric and strongly related conditions for the weak approachability of a convex target set. The first is sufficient: it guarantees that, for any fixed horizon, a player has a strategy ensuring that the expected time-average payoff vector converges to the target set as horizon goes to infinity. The third is necessary: if it is not satisfied, the opponent can weakly exclude the target set. We analyze in detail the special cases where only one of the players has quitting actions. Finally, we study uniform approachability where the strategy should not depend on the horizon and demonstrate that, in contrast with classical Blackwell approachability for convex sets, weak approachability does not imply uniform approachability
Decentralized Learning in Online Queuing Systems
Motivated by packet routing in computer networks, online queuing systems are
composed of queues receiving packets at different rates. Repeatedly, they send
packets to servers, each of them treating only at most one packet at a time. In
the centralized case, the number of accumulated packets remains bounded (i.e.,
the system is \textit{stable}) as long as the ratio between service rates and
arrival rates is larger than . In the decentralized case, individual
no-regret strategies ensures stability when this ratio is larger than . Yet,
myopically minimizing regret disregards the long term effects due to the
carryover of packets to further rounds. On the other hand, minimizing long term
costs leads to stable Nash equilibria as soon as the ratio exceeds
. Stability with decentralized learning strategies with a ratio
below was a major remaining question. We first argue that for ratios up to
, cooperation is required for stability of learning strategies, as selfish
minimization of policy regret, a \textit{patient} notion of regret, might
indeed still be unstable in this case. We therefore consider cooperative queues
and propose the first learning decentralized algorithm guaranteeing stability
of the system as long as the ratio of rates is larger than , thus reaching
performances comparable to centralized strategies.Comment: NeurIPS 2021 camera read
A Unifying Perspective on Multi-Calibration: Game Dynamics for Multi-Objective Learning
We provide a unifying framework for the design and analysis of
multicalibrated predictors. By placing the multicalibration problem in the
general setting of multi-objective learning -- where learning guarantees must
hold simultaneously over a set of distributions and loss functions -- we
exploit connections to game dynamics to achieve state-of-the-art guarantees for
a diverse set of multicalibration learning problems. In addition to shedding
light on existing multicalibration guarantees and greatly simplifying their
analysis, our approach also yields improved guarantees, such as obtaining
stronger multicalibration conditions that scale with the square-root of group
size and improving the complexity of -class multicalibration by an
exponential factor of . Beyond multicalibration, we use these game dynamics
to address emerging considerations in the study of group fairness and
multi-distribution learning.Comment: 45 pages. Authors are ordered alphabeticall
Contributions à l’agrégation séquentielle robuste d’experts : Travaux sur l’erreur d’approximation et la prévision en loi. Applications à la prévision pour les marchés de l’énergie.
We are interested in online forecasting of an arbitrary sequence of observations. At each time step, some experts provide predictions of the next observation. Then, we form our prediction by combining the expert forecasts. This is the setting of online robust aggregation of experts. The goal is to ensure a small cumulative regret. In other words, we want that our cumulative loss does not exceed too much the one of the best expert. We are looking for worst-case guarantees: no stochastic assumption on the data to be predicted is made. The sequence of observations is arbitrary. A first objective of this work is to improve the prediction accuracy. We investigate several possibilities. An example is to design fully automatic procedures that can exploit simplicity of the data whenever it is present. Another example relies on working on the expert set so as to improve its diversity. A second objective of this work is to produce probabilistic predictions. We are interested in coupling the point prediction with a measure of uncertainty (i.e., interval forecasts,…). The real world applications of the above setting are multiple. Indeed, very few assumptions are made on the data. Besides, online learning that deals with data sequentially is crucial to process big data sets in real time. In this thesis, we carry out for EDF several empirical studies of energy data sets and we achieve good forecasting performance.Nous nous intéressons à prévoir séquentiellement une suite arbitraire d'observations. À chaque instant, des experts nous proposent des prévisions de la prochaine observation. Nous formons alors notre prévision en mélangeant celles des experts. C'est le cadre de l'agrégation séquentielle d'experts. L'objectif est d'assurer un faible regret cumulé. En d'autres mots, nous souhaitons que notre perte cumulée ne dépasse pas trop celle du meilleur expert sur le long terme. Nous cherchons des garanties très robustes~: aucune hypothèse stochastique sur la suite d'observations à prévoir n'est faite. Celle-ci est supposée arbitraire et nous souhaitons des garanties qui soient vérifiées quoi qu'il arrive. Un premier objectif de ce travail est l'amélioration de la performance des prévisions. Plusieurs possibilités sont proposées. Un exemple est la création d'algorithmes adaptatifs qui cherchent à s'adapter automatiquement à la difficulté de la suite à prévoir. Un autre repose sur la création de nouveaux experts à inclure au mélange pour apporter de la diversité dans l'ensemble d'experts. Un deuxième objectif de la thèse est d'assortir les prévisions d'une mesure d'incertitude, voire de prévoir des lois. Les applications pratiques sont nombreuses. En effet, très peu d'hypothèses sont faites sur les données. Le côté séquentiel permet entre autres de traiter de grands ensembles de données. Nous considérons dans cette thèse divers jeux de données du monde de l'énergie (consommation électrique, prix de l'électricité,...) pour montrer l'universalité de l'approche
Beyond Statistical Fairness
In recent years, a great deal of fairness notions has been proposed. Yet, most of them take a reductionist approach by indirectly viewing fairness as equalizing some error statistic across pre-defined groups. This thesis aims to explore some ideas as to how to go beyond such statistical fairness frameworks.
First, we consider settings in which the right notion of fairness may not be captured by simple mathematical definitions but might be more complex and nuanced and thus require elicitation from individual or collective stakeholders. By asking stakeholders to make pairwise comparisons to learn which pair of individuals should be treated similarly, we show how to approximately learn the most accurate classifier or converge to such one subject to the elicited fairness constraints. We consider an offline setting where the pairwise comparisons must be made prior to training a model and an online setting where one can continually provide fairness feedback to the deployed model in each round. We also report preliminary findings of a behavioral study of our framework using human-subject fairness constraints elicited on the COMPAS criminal recidivism dataset.
Second, unlike most of the statistical fairness framework that promises fairness for pre-defined and often coarse groups, we provide fairness guarantees for finer subgroups, such as all possible intersections of the pre-defined groups, in the context of uncertainty estimation in both offline and online setting. Our framework gives uncertainty guarantees that are more locally sensible than the ones given by conformal prediction techniques; our uncertainty estimates are valid even when averaged over any subgroup, but uncertainty estimates in conformal predictions are usually only valid when averaged over the entire population