Search CORE

4,714 research outputs found

Optimal Quantum Sample Complexity of Learning Algorithms

Author: Arunachalam Srinivasan
de Wolf Ronald
Publication venue
Publication date: 01/01/2017
Field of study

\newcommand{\eps}{\varepsilon}

In learning theory, the VC dimension of a concept class

C

is the most common way to measure its "richness." In the PAC model \Theta\Big(\frac{d}{\eps} + \frac{\log(1/\delta)}{\eps}\Big) examples are necessary and sufficient for a learner to output, with probability

1-\delta

, a hypothesis

h

that is \eps-close to the target concept

c

. In the related agnostic model, where the samples need not come from a

c\in C

, we know that \Theta\Big(\frac{d}{\eps^2} + \frac{\log(1/\delta)}{\eps^2}\Big) examples are necessary and sufficient to output an hypothesis

h\in C

whose error is at most \eps worse than the best concept in

C

. Here we analyze quantum sample complexity, where each example is a coherent quantum state. This model was introduced by Bshouty and Jackson, who showed that quantum examples are more powerful than classical examples in some fixed-distribution settings. However, Atici and Servedio, improved by Zhang, showed that in the PAC setting, quantum examples cannot be much more powerful: the required number of quantum examples is \Omega\Big(\frac{d^{1-\eta}}{\eps} + d + \frac{\log(1/\delta)}{\eps}\Big)\mbox{ for all }\eta> 0. Our main result is that quantum and classical sample complexity are in fact equal up to constant factors in both the PAC and agnostic models. We give two approaches. The first is a fairly simple information-theoretic argument that yields the above two classical bounds and yields the same bounds for quantum sample complexity up to a \log(d/\eps) factor. We then give a second approach that avoids the log-factor loss, based on analyzing the behavior of the "Pretty Good Measurement" on the quantum state identification problems that correspond to learning. This shows classical and quantum sample complexity are equal up to constant factors.Comment: 31 pages LaTeX. Arxiv abstract shortened to fit in their 1920-character limit. Version 3: many small changes, no change in result

arXiv.org e-Print Archive

CWI's Institutional Repository

Dagstuhl Research Online Publication Server

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Agnostic Learning of Disjunctions on Symmetric Distributions

Author: Feldman Vitaly
Kothari Pravesh
Publication venue
Publication date: 25/05/2015
Field of study

We consider the problem of approximating and learning disjunctions (or equivalently, conjunctions) on symmetric distributions over

\{0,1\}^n

. Symmetric distributions are distributions whose PDF is invariant under any permutation of the variables. We give a simple proof that for every symmetric distribution

\mathcal{D}

, there exists a set of

n^{O(\log{(1/\epsilon)})}

functions

\mathcal{S}

, such that for every disjunction

c

, there is function

p

, expressible as a linear combination of functions in

\mathcal{S}

, such that

p

\epsilon

-approximates

c

\ell_1

distance on

\mathcal{D}

\mathbf{E}_{x \sim \mathcal{D}}[ |c(x)-p(x)|] \leq \epsilon

. This directly gives an agnostic learning algorithm for disjunctions on symmetric distributions that runs in time

n^{O( \log{(1/\epsilon)})}

. The best known previous bound is

n^{O(1/\epsilon^4)}

and follows from approximation of the more general class of halfspaces (Wimmer, 2010). We also show that there exists a symmetric distribution

\mathcal{D}

, such that the minimum degree of a polynomial that

1/3

-approximates the disjunction of all

n

variables is

\ell_1

distance on

\mathcal{D}

\Omega( \sqrt{n})

. Therefore the learning result above cannot be achieved via

\ell_1

-regression with a polynomial basis used in most other agnostic learning algorithms. Our technique also gives a simple proof that for any product distribution

\mathcal{D}

and every disjunction

c

, there exists a polynomial

p

of degree

O(\log{(1/\epsilon)})

such that

p

\epsilon

-approximates

c

\ell_1

distance on

\mathcal{D}

. This was first proved by Blais et al. (2008) via a more involved argument

arXiv.org e-Print Archive

CiteSeerX

Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

Author: Bun Mark
Steinke Thomas
Publication venue
Publication date: 08/12/2014
Field of study

Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on

\mathbb{R}^n

in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to

\exp(-|x|^{0.99})

. Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only

O(\log(1/\delta))

-wise independence, for a tail probability of

\delta

. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science.Comment: 22 page

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Approximate resilience, monotonicity, and the complexity of agnostic learning

Author: Dachman-Soled Dana
Feldman Vitaly
Tan Li-Yang
Wan Andrew
Wimmer Karl
Publication venue
Publication date: 09/07/2014
Field of study

A function

f

d

-resilient if all its Fourier coefficients of degree at most

d

are zero, i.e.,

f

is uncorrelated with all low-degree parities. We study the notion of

\mathit{approximate}

\mathit{resilience}

of Boolean functions, where we say that

f

\alpha

-approximately

d

-resilient if

f

\alpha

-close to a

[-1,1]

-valued

d

-resilient function in

\ell_1

distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class

C

over the uniform distribution. Roughly speaking, if all functions in a class

C

are far from being

d

-resilient then

C

can be learned agnostically in time

n^{O(d)}

and conversely, if

C

contains a function close to being

d

-resilient then agnostic learning of

C

in the statistical query (SQ) framework of Kearns has complexity of at least

n^{\Omega(d)}

. This characterization is based on the duality between

\ell_1

approximation by degree-

d

polynomials and approximate

d

-resilience that we establish. In particular, it implies that

\ell_1

approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal

\alpha

-approximately

\widetilde{\Omega}(\alpha\sqrt{n})

-resilient monotone functions for all

\alpha>0

. Prior to our work, it was conceivable even that every monotone function is

\Omega(1)

-far from any

1

-resilient function. Furthermore, we construct simple, explicit monotone functions based on

{\sf Tribes}

and

{\sf CycleRun}

that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas

arXiv.org e-Print Archive

CiteSeerX

Crossref