294 research outputs found
Online Agnostic Boosting via Regret Minimization
Boosting is a widely used machine learning approach based on the idea of
aggregating weak learning rules. While in statistical learning numerous
boosting methods exist both in the realizable and agnostic settings, in online
learning they exist only in the realizable case. In this work we provide the
first agnostic online boosting algorithm; that is, given a weak learner with
only marginally-better-than-trivial regret guarantees, our algorithm boosts it
to a strong learner with sublinear regret.
Our algorithm is based on an abstract (and simple) reduction to online convex
optimization, which efficiently converts an arbitrary online convex optimizer
to an online booster.
Moreover, this reduction extends to the statistical as well as the online
realizable settings, thus unifying the 4 cases of statistical/online and
agnostic/realizable boosting
Characterizing notions of omniprediction via multicalibration
A recent line of work shows that notions of multigroup fairness imply
surprisingly strong notions of omniprediction: loss minimization guarantees
that apply not just for a specific loss function, but for any loss belonging to
a large family of losses. While prior work has derived various notions of
omniprediction from multigroup fairness guarantees of varying strength, it was
unknown whether the connection goes in both directions.
In this work, we answer this question in the affirmative, establishing
equivalences between notions of multicalibration and omniprediction. The new
definitions that hold the key to this equivalence are new notions of swap
omniprediction, which are inspired by swap regret in online learning. We show
that these can be characterized exactly by a strengthening of multicalibration
that we refer to as swap multicalibration. One can go from standard to swap
multicalibration by a simple discretization; moreover all known algorithms for
standard multicalibration in fact give swap multicalibration. In the context of
omniprediction though, introducing the notion of swapping results in provably
stronger notions, which require a predictor to minimize expected loss at least
as well as an adaptive adversary who can choose both the loss function and
hypothesis based on the value predicted by the predictor.
Building on these characterizations, we paint a complete picture of the
relationship between the various omniprediction notions in the literature by
establishing implications and separations between them. Our work deepens our
understanding of the connections between multigroup fairness, loss minimization
and outcome indistinguishability and establishes new connections to classic
notions in online learning
A Unified View of Large-scale Zero-sum Equilibrium Computation
The task of computing approximate Nash equilibria in large zero-sum
extensive-form games has received a tremendous amount of attention due mainly
to the Annual Computer Poker Competition. Immediately after its inception, two
competing and seemingly different approaches emerged---one an application of
no-regret online learning, the other a sophisticated gradient method applied to
a convex-concave saddle-point formulation. Since then, both approaches have
grown in relative isolation with advancements on one side not effecting the
other. In this paper, we rectify this by dissecting and, in a sense, unify the
two views.Comment: AAAI Workshop on Computer Poker and Imperfect Informatio
Comparative Learning: A Sample Complexity Theory for Two Hypothesis Classes
In many learning theory problems, a central role is played by a hypothesis class: we might assume that the data is labeled according to a hypothesis in the class (usually referred to as the realizable setting), or we might evaluate the learned model by comparing it with the best hypothesis in the class (the agnostic setting). Taking a step beyond these classic setups that involve only a single hypothesis class, we study a variety of problems that involve two hypothesis classes simultaneously.
We introduce comparative learning as a combination of the realizable and agnostic settings in PAC learning: given two binary hypothesis classes S and B, we assume that the data is labeled according to a hypothesis in the source class S and require the learned model to achieve an accuracy comparable to the best hypothesis in the benchmark class B. Even when both S and B have infinite VC dimensions, comparative learning can still have a small sample complexity. We show that the sample complexity of comparative learning is characterized by the mutual VC dimension VC(S,B) which we define to be the maximum size of a subset shattered by both S and B. We also show a similar result in the online setting, where we give a regret characterization in terms of the analogous mutual Littlestone dimension Ldim(S,B). These results also hold for partial hypotheses.
We additionally show that the insights necessary to characterize the sample complexity of comparative learning can be applied to other tasks involving two hypothesis classes. In particular, we characterize the sample complexity of realizable multiaccuracy and multicalibration using the mutual fat-shattering dimension, an analogue of the mutual VC dimension for real-valued hypotheses. This not only solves an open problem proposed by Hu, Peale, Reingold (2022), but also leads to independently interesting results extending classic ones about regression, boosting, and covering number to our two-hypothesis-class setting
- …