13 research outputs found
Online Agnostic Boosting via Regret Minimization
Boosting is a widely used machine learning approach based on the idea of
aggregating weak learning rules. While in statistical learning numerous
boosting methods exist both in the realizable and agnostic settings, in online
learning they exist only in the realizable case. In this work we provide the
first agnostic online boosting algorithm; that is, given a weak learner with
only marginally-better-than-trivial regret guarantees, our algorithm boosts it
to a strong learner with sublinear regret.
Our algorithm is based on an abstract (and simple) reduction to online convex
optimization, which efficiently converts an arbitrary online convex optimizer
to an online booster.
Moreover, this reduction extends to the statistical as well as the online
realizable settings, thus unifying the 4 cases of statistical/online and
agnostic/realizable boosting
Boosting-based Construction of BDDs for Linear Threshold Functions and Its Application to Verification of Neural Networks
Understanding the characteristics of neural networks is important but
difficult due to their complex structures and behaviors. Some previous work
proposes to transform neural networks into equivalent Boolean expressions and
apply verification techniques for characteristics of interest. This approach is
promising since rich results of verification techniques for circuits and other
Boolean expressions can be readily applied. The bottleneck is the time
complexity of the transformation. More precisely, (i) each neuron of the
network, i.e., a linear threshold function, is converted to a Binary Decision
Diagram (BDD), and (ii) they are further combined into some final form, such as
Boolean circuits. For a linear threshold function with variables, an
existing method takes time to construct an ordered BDD of
size consistent with some variable ordering. However, it
is non-trivial to choose a variable ordering producing a small BDD among
candidates.
We propose a method to convert a linear threshold function to a specific form
of a BDD based on the boosting approach in the machine learning literature. Our
method takes time and outputs BDD of size
, where is the margin of some
consistent linear threshold function. Our method does not need to search for
good variable orderings and produces a smaller expression when the margin of
the linear threshold function is large. More precisely, our method is based on
our new boosting algorithm, which is of independent interest. We also propose a
method to combine them into the final Boolean expression representing the
neural network
The sample complexity of multi-distribution learning
Multi-distribution learning generalizes the classic PAC learning to handle
data coming from multiple distributions. Given a set of data distributions
and a hypothesis class of VC dimension , the goal is to learn a hypothesis
that minimizes the maximum population loss over distributions, up to
additive error. In this paper, we settle the sample complexity of
multi-distribution learning by giving an algorithm of sample complexity
. This matches the
lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem
of Awasthi, Haghtalab and Zhao [AHZ23]
Loss minimization yields multicalibration for large neural networks
Multicalibration is a notion of fairness that aims to provide accurate
predictions across a large set of groups. Multicalibration is known to be a
different goal than loss minimization, even for simple predictors such as
linear functions. In this note, we show that for (almost all) large neural
network sizes, optimally minimizing squared error leads to multicalibration.
Our results are about representational aspects of neural networks, and not
about algorithmic or sample complexity considerations. Previous such results
were known only for predictors that were nearly Bayes-optimal and were
therefore representation independent. We emphasize that our results do not
apply to specific algorithms for optimizing neural networks, such as SGD, and
they should not be interpreted as "fairness comes for free from optimizing
neural networks"
The Cost of Parallelizing Boosting
We study the cost of parallelizing weak-to-strong boosting algorithms for
learning, following the recent work of Karbasi and Larsen. Our main results are
two-fold:
- First, we prove a tight lower bound, showing that even "slight"
parallelization of boosting requires an exponential blow-up in the complexity
of training.
Specifically, let be the weak learner's advantage over random
guessing. The famous \textsc{AdaBoost} algorithm produces an accurate
hypothesis by interacting with the weak learner for
rounds where each round runs in polynomial time.
Karbasi and Larsen showed that "significant" parallelization must incur
exponential blow-up: Any boosting algorithm either interacts with the weak
learner for rounds or incurs an blow-up
in the complexity of training, where is the VC dimension of the hypothesis
class. We close the gap by showing that any boosting algorithm either has
rounds of interaction or incurs a smaller exponential
blow-up of .
-Complementing our lower bound, we show that there exists a boosting
algorithm using rounds, and only suffer a blow-up
of .
Plugging in , this shows that the smaller blow-up in our lower
bound is tight. More interestingly, this provides the first trade-off between
the parallelism and the total work required for boosting.Comment: appeared in SODA 202