Search CORE

149 research outputs found

Approximate resilience, monotonicity, and the complexity of agnostic learning

Author: Dachman-Soled Dana
Feldman Vitaly
Tan Li-Yang
Wan Andrew
Wimmer Karl
Publication venue
Publication date: 09/07/2014
Field of study

A function

f

d

-resilient if all its Fourier coefficients of degree at most

d

are zero, i.e.,

f

is uncorrelated with all low-degree parities. We study the notion of

\mathit{approximate}

\mathit{resilience}

of Boolean functions, where we say that

f

\alpha

-approximately

d

-resilient if

f

\alpha

-close to a

[-1,1]

-valued

d

-resilient function in

\ell_1

distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class

C

over the uniform distribution. Roughly speaking, if all functions in a class

C

are far from being

d

-resilient then

C

can be learned agnostically in time

n^{O(d)}

and conversely, if

C

contains a function close to being

d

-resilient then agnostic learning of

C

in the statistical query (SQ) framework of Kearns has complexity of at least

n^{\Omega(d)}

. This characterization is based on the duality between

\ell_1

approximation by degree-

d

polynomials and approximate

d

-resilience that we establish. In particular, it implies that

\ell_1

approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal

\alpha

-approximately

\widetilde{\Omega}(\alpha\sqrt{n})

-resilient monotone functions for all

\alpha>0

. Prior to our work, it was conceivable even that every monotone function is

\Omega(1)

-far from any

1

-resilient function. Furthermore, we construct simple, explicit monotone functions based on

{\sf Tribes}

and

{\sf CycleRun}

that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas

arXiv.org e-Print Archive

CiteSeerX

Crossref

Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas

Author: Feldman Vitaly
Vondrak Jan
Publication venue
Publication date: 30/03/2015
Field of study

We investigate the approximability of several classes of real-valued functions by functions of a small number of variables ({\em juntas}). Our main results are tight bounds on the number of variables required to approximate a function

f:\{0,1\}^n \rightarrow [0,1]

within

\ell_2

-error

\epsilon

over the uniform distribution: 1. If

f

is submodular, then it is

\epsilon

-close to a function of

O(\frac{1}{\epsilon^2} \log \frac{1}{\epsilon})

variables. This is an exponential improvement over previously known results. We note that

\Omega(\frac{1}{\epsilon^2})

variables are necessary even for linear functions. 2. If

f

is fractionally subadditive (XOS) it is

\epsilon

-close to a function of

2^{O(1/\epsilon^2)}

variables. This result holds for all functions with low total

\ell_1

-influence and is a real-valued analogue of Friedgut's theorem for boolean functions. We show that

2^{\Omega(1/\epsilon)}

variables are necessary even for XOS functions. As applications of these results, we provide learning algorithms over the uniform distribution. For XOS functions, we give a PAC learning algorithm that runs in time

2^{poly(1/\epsilon)} poly(n)

. For submodular functions we give an algorithm in the more demanding PMAC learning model (Balcan and Harvey, 2011) which requires a multiplicative

1+\gamma

factor approximation with probability at least

1-\epsilon

over the target distribution. Our uniform distribution algorithm runs in time

2^{poly(1/(\gamma\epsilon))} poly(n)

. This is the first algorithm in the PMAC model that over the uniform distribution can achieve a constant approximation factor arbitrarily close to 1 for all submodular functions. As follows from the lower bounds in (Feldman et al., 2013) both of these algorithms are close to optimal. We also give applications for proper learning, testing and agnostic learning with value queries of these classes.Comment: Extended abstract appears in proceedings of FOCS 201

arXiv.org e-Print Archive

Crossref

Local Correction of Juntas

Author: Alon Noga
Weinstein Amit
Publication venue
Publication date: 24/12/2011
Field of study

A Boolean function f over n variables is said to be q-locally correctable if, given a black-box access to a function g which is "close" to an isomorphism f_sigma of f, we can compute f_sigma(x) for any x in Z_2^n with good probability using q queries to g. We observe that any k-junta, that is, any function which depends only on k of its input variables, is O(2^k)-locally correctable. Moreover, we show that there are examples where this is essentially best possible, and locally correcting some k-juntas requires a number of queries which is exponential in k. These examples, however, are far from being typical, and indeed we prove that for almost every k-junta, O(k log k) queries suffice.Comment: 6 page

arXiv.org e-Print Archive

CiteSeerX

Learning DNFs under product distributions via {\mu}-biased quantum Fourier sampling

Author: Kanade Varun
Rocchetto Andrea
Severini Simone
Publication venue
Publication date: 01/01/2019
Field of study

We show that DNF formulae can be quantum PAC-learned in polynomial time under product distributions using a quantum example oracle. The best classical algorithm (without access to membership queries) runs in superpolynomial time. Our result extends the work by Bshouty and Jackson (1998) that proved that DNF formulae are efficiently learnable under the uniform distribution using a quantum example oracle. Our proof is based on a new quantum algorithm that efficiently samples the coefficients of a {\mu}-biased Fourier transform.Comment: 17 pages; v3 based on journal version; minor corrections and clarification

arXiv.org e-Print Archive

Oxford University Research Archive

Embedding Hard Learning Problems Into Gaussian Space

Author: Klivans Adam
Kothari Pravesh
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014)
Publication date: 01/01/2014
Field of study

We give the first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution. We reduce from the problem of learning sparse parities with noise with respect to the uniform distribution on the hypercube (sparse LPN), a notoriously hard problem in theoretical computer science and show that any algorithm for agnostically learning halfspaces requires n^Omega(log(1/epsilon)) time under the assumption that k-sparse LPN requires n^Omega(k) time, ruling out a polynomial time algorithm for the problem. As far as we are aware, this is the first representation-independent hardness result for supervised learning when the underlying distribution is restricted to be a Gaussian. We also show that the problem of agnostically learning sparse polynomials with respect to the Gaussian distribution in polynomial time is as hard as PAC learning DNFs on the uniform distribution in polynomial time. This complements the surprising result of Andoni et. al. 2013 who show that sparse polynomials are learnable under random Gaussian noise in polynomial time. Taken together, these results show the inherent difficulty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Our results use a novel embedding of random labeled examples from the uniform distribution on the Boolean hypercube into random labeled examples from the Gaussian distribution that allows us to relate the hardness of learning problems on two different domains and distributions

CiteSeerX

Dagstuhl Research Online Publication Server

Almost Optimal Testers for Concise Representations

Author: Bshouty Nader H.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)
Publication date: 07/11/2019
Field of study

We give improved and almost optimal testers for several classes of Boolean functions on n variables that have concise representation in the uniform and distribution-free model. Classes, such as k-Junta, k-Linear, s-Term DNF, s-Term Monotone DNF, r-DNF, Decision List, r-Decision List, size-s Decision Tree, size-s Boolean Formula, size-s Branching Program, s-Sparse Polynomial over the binary field and functions with Fourier Degree at most d. The approach is new and combines ideas from Diakonikolas et al. [Ilias Diakonikolas et al., 2007], Bshouty [Nader H. Bshouty, 2018], Goldreich et al. [Oded Goldreich et al., 1998], and learning theory. The method can be extended to several other classes of functions over any domain that can be approximated by functions with a small number of relevant variables

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Author: Bhattamishra Satwik
Blunsom Phil
Kanade Varun
Patel Arkil
Publication venue
Publication date: 22/11/2022
Field of study

Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer's effective generalization performance despite relatively limited expressiveness.Comment: Preprin

arXiv.org e-Print Archive