13 research outputs found

    Papers to Appear in Forthcoming Issues

    Get PDF

    Online Agnostic Boosting via Regret Minimization

    Full text link
    Boosting is a widely used machine learning approach based on the idea of aggregating weak learning rules. While in statistical learning numerous boosting methods exist both in the realizable and agnostic settings, in online learning they exist only in the realizable case. In this work we provide the first agnostic online boosting algorithm; that is, given a weak learner with only marginally-better-than-trivial regret guarantees, our algorithm boosts it to a strong learner with sublinear regret. Our algorithm is based on an abstract (and simple) reduction to online convex optimization, which efficiently converts an arbitrary online convex optimizer to an online booster. Moreover, this reduction extends to the statistical as well as the online realizable settings, thus unifying the 4 cases of statistical/online and agnostic/realizable boosting

    Boosting-based Construction of BDDs for Linear Threshold Functions and Its Application to Verification of Neural Networks

    Full text link
    Understanding the characteristics of neural networks is important but difficult due to their complex structures and behaviors. Some previous work proposes to transform neural networks into equivalent Boolean expressions and apply verification techniques for characteristics of interest. This approach is promising since rich results of verification techniques for circuits and other Boolean expressions can be readily applied. The bottleneck is the time complexity of the transformation. More precisely, (i) each neuron of the network, i.e., a linear threshold function, is converted to a Binary Decision Diagram (BDD), and (ii) they are further combined into some final form, such as Boolean circuits. For a linear threshold function with nn variables, an existing method takes O(n2n2)O(n2^{\frac{n}{2}}) time to construct an ordered BDD of size O(2n2)O(2^{\frac{n}{2}}) consistent with some variable ordering. However, it is non-trivial to choose a variable ordering producing a small BDD among n!n! candidates. We propose a method to convert a linear threshold function to a specific form of a BDD based on the boosting approach in the machine learning literature. Our method takes O(2npoly(1/ρ))O(2^n \text{poly}(1/\rho)) time and outputs BDD of size O(n2ρ4ln1ρ)O(\frac{n^2}{\rho^4}\ln{\frac{1}{\rho}}), where ρ\rho is the margin of some consistent linear threshold function. Our method does not need to search for good variable orderings and produces a smaller expression when the margin of the linear threshold function is large. More precisely, our method is based on our new boosting algorithm, which is of independent interest. We also propose a method to combine them into the final Boolean expression representing the neural network

    The sample complexity of multi-distribution learning

    Full text link
    Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of kk data distributions and a hypothesis class of VC dimension dd, the goal is to learn a hypothesis that minimizes the maximum population loss over kk distributions, up to ϵ\epsilon additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity O~((d+k)ϵ2)(k/ϵ)o(1)\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}. This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao [AHZ23]

    Loss minimization yields multicalibration for large neural networks

    Full text link
    Multicalibration is a notion of fairness that aims to provide accurate predictions across a large set of groups. Multicalibration is known to be a different goal than loss minimization, even for simple predictors such as linear functions. In this note, we show that for (almost all) large neural network sizes, optimally minimizing squared error leads to multicalibration. Our results are about representational aspects of neural networks, and not about algorithmic or sample complexity considerations. Previous such results were known only for predictors that were nearly Bayes-optimal and were therefore representation independent. We emphasize that our results do not apply to specific algorithms for optimizing neural networks, such as SGD, and they should not be interpreted as "fairness comes for free from optimizing neural networks"

    The Cost of Parallelizing Boosting

    Full text link
    We study the cost of parallelizing weak-to-strong boosting algorithms for learning, following the recent work of Karbasi and Larsen. Our main results are two-fold: - First, we prove a tight lower bound, showing that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training. Specifically, let γ\gamma be the weak learner's advantage over random guessing. The famous \textsc{AdaBoost} algorithm produces an accurate hypothesis by interacting with the weak learner for O~(1/γ2)\tilde{O}(1 / \gamma^2) rounds where each round runs in polynomial time. Karbasi and Larsen showed that "significant" parallelization must incur exponential blow-up: Any boosting algorithm either interacts with the weak learner for Ω(1/γ)\Omega(1 / \gamma) rounds or incurs an exp(d/γ)\exp(d / \gamma) blow-up in the complexity of training, where dd is the VC dimension of the hypothesis class. We close the gap by showing that any boosting algorithm either has Ω(1/γ2)\Omega(1 / \gamma^2) rounds of interaction or incurs a smaller exponential blow-up of exp(d)\exp(d). -Complementing our lower bound, we show that there exists a boosting algorithm using O~(1/(tγ2))\tilde{O}(1/(t \gamma^2)) rounds, and only suffer a blow-up of exp(dt2)\exp(d \cdot t^2). Plugging in t=ω(1)t = \omega(1), this shows that the smaller blow-up in our lower bound is tight. More interestingly, this provides the first trade-off between the parallelism and the total work required for boosting.Comment: appeared in SODA 202

    Loss Minimization Through the Lens Of Outcome Indistinguishability

    Get PDF

    Omnipredictors

    Get PDF
    corecore