Search CORE

1,707 research outputs found

Approximate resilience, monotonicity, and the complexity of agnostic learning

Author: Dachman-Soled Dana
Feldman Vitaly
Tan Li-Yang
Wan Andrew
Wimmer Karl
Publication venue
Publication date: 09/07/2014
Field of study

A function

f

d

-resilient if all its Fourier coefficients of degree at most

d

are zero, i.e.,

f

is uncorrelated with all low-degree parities. We study the notion of

\mathit{approximate}

\mathit{resilience}

of Boolean functions, where we say that

f

\alpha

-approximately

d

-resilient if

f

\alpha

-close to a

[-1,1]

-valued

d

-resilient function in

\ell_1

distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class

C

over the uniform distribution. Roughly speaking, if all functions in a class

C

are far from being

d

-resilient then

C

can be learned agnostically in time

n^{O(d)}

and conversely, if

C

contains a function close to being

d

-resilient then agnostic learning of

C

in the statistical query (SQ) framework of Kearns has complexity of at least

n^{\Omega(d)}

. This characterization is based on the duality between

\ell_1

approximation by degree-

d

polynomials and approximate

d

-resilience that we establish. In particular, it implies that

\ell_1

approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal

\alpha

-approximately

\widetilde{\Omega}(\alpha\sqrt{n})

-resilient monotone functions for all

\alpha>0

. Prior to our work, it was conceivable even that every monotone function is

\Omega(1)

-far from any

1

-resilient function. Furthermore, we construct simple, explicit monotone functions based on

{\sf Tribes}

and

{\sf CycleRun}

that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas

arXiv.org e-Print Archive

CiteSeerX

Crossref

Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

Author: Blanc Guy
Lange Jane
Tan Li-Yang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 17/11/2019
Field of study

Consider the following heuristic for building a decision tree for a function

f : \{0,1\}^n \to \{\pm 1\}

. Place the most influential variable

x_i

f

at the root, and recurse on the subfunctions

f_{x_i=0}

and

f_{x_i=1}

on the left and right subtrees respectively; terminate once the tree is an

\varepsilon

-approximation of

f

. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds:

\circ

Upper bound: For every

f

with decision tree size

s

and every

\varepsilon \in (0,\frac1{2})

, this heuristic builds a decision tree of size at most

s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}

\circ

Lower bound: For every

\varepsilon \in (0,\frac1{2})

and

s \le 2^{\tilde{O}(\sqrt{n})}

, there is an

f

with decision tree size

s

such that this heuristic builds a decision tree of size

s^{\tilde{\Omega}(\log s)}

. We also obtain upper and lower bounds for monotone functions:

s^{O(\sqrt{\log s}/\varepsilon)}

and

s^{\tilde{\Omega}(\sqrt[4]{\log s } )}

respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas

Author: Feldman Vitaly
Vondrak Jan
Publication venue
Publication date: 30/03/2015
Field of study

We investigate the approximability of several classes of real-valued functions by functions of a small number of variables ({\em juntas}). Our main results are tight bounds on the number of variables required to approximate a function

f:\{0,1\}^n \rightarrow [0,1]

within

\ell_2

-error

\epsilon

over the uniform distribution: 1. If

f

is submodular, then it is

\epsilon

-close to a function of

O(\frac{1}{\epsilon^2} \log \frac{1}{\epsilon})

variables. This is an exponential improvement over previously known results. We note that

\Omega(\frac{1}{\epsilon^2})

variables are necessary even for linear functions. 2. If

f

is fractionally subadditive (XOS) it is

\epsilon

-close to a function of

2^{O(1/\epsilon^2)}

variables. This result holds for all functions with low total

\ell_1

-influence and is a real-valued analogue of Friedgut's theorem for boolean functions. We show that

2^{\Omega(1/\epsilon)}

variables are necessary even for XOS functions. As applications of these results, we provide learning algorithms over the uniform distribution. For XOS functions, we give a PAC learning algorithm that runs in time

2^{poly(1/\epsilon)} poly(n)

. For submodular functions we give an algorithm in the more demanding PMAC learning model (Balcan and Harvey, 2011) which requires a multiplicative

1+\gamma

factor approximation with probability at least

1-\epsilon

over the target distribution. Our uniform distribution algorithm runs in time

2^{poly(1/(\gamma\epsilon))} poly(n)

. This is the first algorithm in the PMAC model that over the uniform distribution can achieve a constant approximation factor arbitrarily close to 1 for all submodular functions. As follows from the lower bounds in (Feldman et al., 2013) both of these algorithms are close to optimal. We also give applications for proper learning, testing and agnostic learning with value queries of these classes.Comment: Extended abstract appears in proceedings of FOCS 201

arXiv.org e-Print Archive

Crossref

Online Learning of k-CNF Boolean Functions

Author: Hutter Marcus
Veness Joel
Publication venue
Publication date: 26/03/2014
Field of study

This paper revisits the problem of learning a k-CNF Boolean function from examples in the context of online learning under the logarithmic loss. In doing so, we give a Bayesian interpretation to one of Valiant's celebrated PAC learning algorithms, which we then build upon to derive two efficient, online, probabilistic, supervised learning algorithms for predicting the output of an unknown k-CNF Boolean function. We analyze the loss of our methods, and show that the cumulative log-loss can be upper bounded, ignoring logarithmic factors, by a polynomial function of the size of each example.Comment: 20 LaTeX pages. 2 Algorithms. Some Theorem

arXiv.org e-Print Archive

CiteSeerX

A Law of Large Numbers for Weighted Majority

Author: Haggstrom Olle
Kalai Gil
Mossel Elchanan
Publication venue
Publication date: 01/01/2004
Field of study

Consider an election between two candidates in which the voters' choices are random and independent and the probability of a voter choosing the first candidate is

p>1/2

. Condorcet's Jury Theorem which he derived from the weak law of large numbers asserts that if the number of voters tends to infinity then the probability that the first candidate will be elected tends to one. The notion of influence of a voter or its voting power is relevant for extensions of the weak law of large numbers for voting rules which are more general than simple majority. In this paper we point out two different ways to extend the classical notions of voting power and influences to arbitrary probability distributions. The extension relevant to us is the ``effect'' of a voter, which is a weighted version of the correlation between the voter's vote and the election's outcomes. We prove an extension of the weak law of large numbers to weighted majority games when all individual effects are small and show that this result does not apply to any voting rule which is not based on weighted majority

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Chalmers Research

ScholarlyCommons@Penn

Chalmers Publication Library