Search CORE

13 research outputs found

Papers to Appear in Forthcoming Issues

Author
Publication venue: Published by Elsevier Inc.
Publication date
Field of study

Online Agnostic Boosting via Regret Minimization

Author: Brukhim Nataly
Chen Xinyi
Hazan Elad
Moran Shay
Publication venue
Publication date: 01/01/2020
Field of study

Boosting is a widely used machine learning approach based on the idea of aggregating weak learning rules. While in statistical learning numerous boosting methods exist both in the realizable and agnostic settings, in online learning they exist only in the realizable case. In this work we provide the first agnostic online boosting algorithm; that is, given a weak learner with only marginally-better-than-trivial regret guarantees, our algorithm boosts it to a strong learner with sublinear regret. Our algorithm is based on an abstract (and simple) reduction to online convex optimization, which efficiently converts an arbitrary online convex optimizer to an online booster. Moreover, this reduction extends to the statistical as well as the online realizable settings, thus unifying the 4 cases of statistical/online and agnostic/realizable boosting

arXiv.org e-Print Archive

Princeton University Open Access Repository

Boosting-based Construction of BDDs for Linear Threshold Functions and Its Application to Verification of Neural Networks

Author: Hatano Kohei
Takimoto Eiji
Tang Yiping
Publication venue
Publication date: 08/06/2023
Field of study

Understanding the characteristics of neural networks is important but difficult due to their complex structures and behaviors. Some previous work proposes to transform neural networks into equivalent Boolean expressions and apply verification techniques for characteristics of interest. This approach is promising since rich results of verification techniques for circuits and other Boolean expressions can be readily applied. The bottleneck is the time complexity of the transformation. More precisely, (i) each neuron of the network, i.e., a linear threshold function, is converted to a Binary Decision Diagram (BDD), and (ii) they are further combined into some final form, such as Boolean circuits. For a linear threshold function with

n

variables, an existing method takes

O(n2^{\frac{n}{2}})

time to construct an ordered BDD of size

O(2^{\frac{n}{2}})

consistent with some variable ordering. However, it is non-trivial to choose a variable ordering producing a small BDD among

n!

candidates. We propose a method to convert a linear threshold function to a specific form of a BDD based on the boosting approach in the machine learning literature. Our method takes

O(2^n \text{poly}(1/\rho))

time and outputs BDD of size

O(\frac{n^2}{\rho^4}\ln{\frac{1}{\rho}})

, where

\rho

is the margin of some consistent linear threshold function. Our method does not need to search for good variable orderings and produces a smaller expression when the margin of the linear threshold function is large. More precisely, our method is based on our new boosting algorithm, which is of independent interest. We also propose a method to combine them into the final Boolean expression representing the neural network

arXiv.org e-Print Archive

The sample complexity of multi-distribution learning

Author: Peng Binghui
Publication venue
Publication date: 28/01/2024
Field of study

Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of

k

data distributions and a hypothesis class of VC dimension

d

, the goal is to learn a hypothesis that minimizes the maximum population loss over

k

distributions, up to

\epsilon

additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity

\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}

. This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao [AHZ23]

arXiv.org e-Print Archive

Loss minimization yields multicalibration for large neural networks

Author: Błasiok Jarosław
Gopalan Parikshit
Hu Lunjia
Kalai Adam Tauman
Nakkiran Preetum
Publication venue
Publication date: 19/04/2023
Field of study

Multicalibration is a notion of fairness that aims to provide accurate predictions across a large set of groups. Multicalibration is known to be a different goal than loss minimization, even for simple predictors such as linear functions. In this note, we show that for (almost all) large neural network sizes, optimally minimizing squared error leads to multicalibration. Our results are about representational aspects of neural networks, and not about algorithmic or sample complexity considerations. Previous such results were known only for predictors that were nearly Bayes-optimal and were therefore representation independent. We emphasize that our results do not apply to specific algorithms for optimizing neural networks, such as SGD, and they should not be interpreted as "fairness comes for free from optimizing neural networks"

arXiv.org e-Print Archive

The Cost of Parallelizing Boosting

Author: Lyu Xin
Wu Hongxun
Yang Junzhao
Publication venue
Publication date: 23/02/2024
Field of study

We study the cost of parallelizing weak-to-strong boosting algorithms for learning, following the recent work of Karbasi and Larsen. Our main results are two-fold: - First, we prove a tight lower bound, showing that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training. Specifically, let

\gamma

be the weak learner's advantage over random guessing. The famous \textsc{AdaBoost} algorithm produces an accurate hypothesis by interacting with the weak learner for

\tilde{O}(1 / \gamma^2)

rounds where each round runs in polynomial time. Karbasi and Larsen showed that "significant" parallelization must incur exponential blow-up: Any boosting algorithm either interacts with the weak learner for

\Omega(1 / \gamma)

rounds or incurs an

\exp(d / \gamma)

blow-up in the complexity of training, where

d

is the VC dimension of the hypothesis class. We close the gap by showing that any boosting algorithm either has

\Omega(1 / \gamma^2)

rounds of interaction or incurs a smaller exponential blow-up of

\exp(d)

. -Complementing our lower bound, we show that there exists a boosting algorithm using

\tilde{O}(1/(t \gamma^2))

rounds, and only suffer a blow-up of

\exp(d \cdot t^2)

. Plugging in

t = \omega(1)

, this shows that the smaller blow-up in our lower bound is tight. More interestingly, this provides the first trade-off between the parallelism and the total work required for boosting.Comment: appeared in SODA 202

arXiv.org e-Print Archive

Loss Minimization Through the Lens Of Outcome Indistinguishability

Author: Gopalan Parikshit
Hu Lunjia
Kim Michael P.
Reingold Omer
Wieder Udi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Omnipredictors

Author: Gopalan Parikshit
Kalai Adam Tauman
Reingold Omer
Sharan Vatsal
Wieder Udi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server