Search CORE

3 research outputs found

Optimal PAC Bounds Without Uniform Convergence

Author: Aden-Ali Ishaq
Cherapanamjeri Yeshwanth
Shetty Abhishek
Zhivotovskiy Nikita
Publication venue
Publication date: 18/04/2023
Field of study

In statistical learning theory, determining the sample complexity of realizable binary classification for VC classes was a long-standing open problem. The results of Simon and Hanneke established sharp upper bounds in this setting. However, the reliance of their argument on the uniform convergence principle limits its applicability to more general learning settings such as multiclass classification. In this paper, we address this issue by providing optimal high probability risk bounds through a framework that surpasses the limitations of uniform convergence arguments. Our framework converts the leave-one-out error of permutation invariant predictors into high probability risk bounds. As an application, by adapting the one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth, we propose an algorithm that achieves an optimal PAC bound for binary classification. Specifically, our result shows that certain aggregations of one-inclusion graph algorithms are optimal, addressing a variant of a classic question posed by Warmuth. We further instantiate our framework in three settings where uniform convergence is provably suboptimal. For multiclass classification, we prove an optimal risk bound that scales with the one-inclusion hypergraph density of the class, addressing the suboptimality of the analysis of Daniely and Shalev-Shwartz. For partial hypothesis classification, we determine the optimal sample complexity bound, resolving a question posed by Alon, Hanneke, Holzman, and Moran. For realizable bounded regression with absolute loss, we derive an optimal risk bound that relies on a modified version of the scale-sensitive dimension, refining the results of Bartlett and Long. Our rates surpass standard uniform convergence-based results due to the smaller complexity measure in our risk bound.Comment: 27 page

arXiv.org e-Print Archive

On the amortized complexity of approximate counting

Author: Aden-Ali Ishaq
Han Yanjun
Nelson Jelani
Yu Huacheng
Publication venue
Publication date: 07/11/2022
Field of study

Naively storing a counter up to value

n

would require

\Omega(\log n)

bits of memory. Nelson and Yu [NY22], following work of [Morris78], showed that if the query answers need only be

(1+\epsilon)

-approximate with probability at least

1 - \delta

, then

O(\log\log n + \log\log(1/\delta) + \log(1/\epsilon))

bits suffice, and in fact this bound is tight. Morris' original motivation for studying this problem though, as well as modern applications, require not only maintaining one counter, but rather

k

counters for

k

large. This motivates the following question: for

k

large, can

k

counters be simultaneously maintained using asymptotically less memory than

k

times the cost of an individual counter? That is to say, does this problem benefit from an improved {\it amortized} space complexity bound? We answer this question in the negative. Specifically, we prove a lower bound for nearly the full range of parameters showing that, in terms of memory usage, there is no asymptotic benefit possible via amortization when storing multiple counters. Our main proof utilizes a certain notion of "information cost" recently introduced by Braverman, Garg and Woodruff in FOCS 2020 to prove lower bounds for streaming algorithms

arXiv.org e-Print Archive

Majority-of-Three: The Simplest Optimal Learner?

Author: Aden-Ali Ishaq
Høgsgaard Mikael Møller
Larsen Kasper Green
Zhivotovskiy Nikita
Publication venue
Publication date: 12/03/2024
Field of study

Developing an optimal PAC learning algorithm in the realizable setting, where empirical risk minimization (ERM) is suboptimal, was a major open problem in learning theory for decades. The problem was finally resolved by Hanneke a few years ago. Unfortunately, Hanneke's algorithm is quite complex as it returns the majority vote of many ERM classifiers that are trained on carefully selected subsets of the data. It is thus a natural goal to determine the simplest algorithm that is optimal. In this work we study the arguably simplest algorithm that could be optimal: returning the majority vote of three ERM classifiers. We show that this algorithm achieves the optimal in-expectation bound on its error which is provably unattainable by a single ERM classifier. Furthermore, we prove a near-optimal high-probability bound on this algorithm's error. We conjecture that a better analysis will prove that this algorithm is in fact optimal in the high-probability regime.Comment: 22 page

arXiv.org e-Print Archive