4,714 research outputs found
Optimal Quantum Sample Complexity of Learning Algorithms
In learning theory, the VC dimension of a
concept class is the most common way to measure its "richness." In the PAC
model \Theta\Big(\frac{d}{\eps} + \frac{\log(1/\delta)}{\eps}\Big)
examples are necessary and sufficient for a learner to output, with probability
, a hypothesis that is \eps-close to the target concept . In
the related agnostic model, where the samples need not come from a , we
know that \Theta\Big(\frac{d}{\eps^2} + \frac{\log(1/\delta)}{\eps^2}\Big)
examples are necessary and sufficient to output an hypothesis whose
error is at most \eps worse than the best concept in .
Here we analyze quantum sample complexity, where each example is a coherent
quantum state. This model was introduced by Bshouty and Jackson, who showed
that quantum examples are more powerful than classical examples in some
fixed-distribution settings. However, Atici and Servedio, improved by Zhang,
showed that in the PAC setting, quantum examples cannot be much more powerful:
the required number of quantum examples is
\Omega\Big(\frac{d^{1-\eta}}{\eps} + d + \frac{\log(1/\delta)}{\eps}\Big)\mbox{
for all }\eta> 0. Our main result is that quantum and classical sample
complexity are in fact equal up to constant factors in both the PAC and
agnostic models. We give two approaches. The first is a fairly simple
information-theoretic argument that yields the above two classical bounds and
yields the same bounds for quantum sample complexity up to a \log(d/\eps)
factor. We then give a second approach that avoids the log-factor loss, based
on analyzing the behavior of the "Pretty Good Measurement" on the quantum state
identification problems that correspond to learning. This shows classical and
quantum sample complexity are equal up to constant factors.Comment: 31 pages LaTeX. Arxiv abstract shortened to fit in their
1920-character limit. Version 3: many small changes, no change in result
Agnostic Learning of Disjunctions on Symmetric Distributions
We consider the problem of approximating and learning disjunctions (or
equivalently, conjunctions) on symmetric distributions over .
Symmetric distributions are distributions whose PDF is invariant under any
permutation of the variables. We give a simple proof that for every symmetric
distribution , there exists a set of
functions , such that for every disjunction , there is function
, expressible as a linear combination of functions in , such
that -approximates in distance on or
. This directly
gives an agnostic learning algorithm for disjunctions on symmetric
distributions that runs in time . The best known
previous bound is and follows from approximation of the
more general class of halfspaces (Wimmer, 2010). We also show that there exists
a symmetric distribution , such that the minimum degree of a
polynomial that -approximates the disjunction of all variables is
distance on is . Therefore the
learning result above cannot be achieved via -regression with a
polynomial basis used in most other agnostic learning algorithms.
Our technique also gives a simple proof that for any product distribution
and every disjunction , there exists a polynomial of
degree such that -approximates in
distance on . This was first proved by Blais et al.
(2008) via a more involved argument
Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness
Polynomial approximations to boolean functions have led to many positive
results in computer science. In particular, polynomial approximations to the
sign function underly algorithms for agnostically learning halfspaces, as well
as pseudorandom generators for halfspaces. In this work, we investigate the
limits of these techniques by proving inapproximability results for the sign
function.
Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput.
2008) shows that halfspaces can be learned with respect to log-concave
distributions on in the challenging agnostic learning model. The
power of this algorithm relies on the fact that under log-concave
distributions, halfspaces can be approximated arbitrarily well by low-degree
polynomials. We ask whether this technique can be extended beyond log-concave
distributions, and establish a negative result. We show that polynomials of any
degree cannot approximate the sign function to within arbitrarily low error for
a large class of non-log-concave distributions on the real line, including
those with densities proportional to .
Secondly, we investigate the derandomization of Chernoff-type concentration
inequalities. Chernoff-type tail bounds on sums of independent random variables
have pervasive applications in theoretical computer science. Schmidt et al.
(SIAM J. Discrete Math. 1995) showed that these inequalities can be established
for sums of random variables with only -wise independence,
for a tail probability of . We show that their results are tight up to
constant factors.
These results rely on techniques from weighted approximation theory, which
studies how well functions on the real line can be approximated by polynomials
under various distributions. We believe that these techniques will have further
applications in other areas of computer science.Comment: 22 page
Approximate resilience, monotonicity, and the complexity of agnostic learning
A function is -resilient if all its Fourier coefficients of degree at
most are zero, i.e., is uncorrelated with all low-degree parities. We
study the notion of of Boolean
functions, where we say that is -approximately -resilient if
is -close to a -valued -resilient function in
distance. We show that approximate resilience essentially characterizes the
complexity of agnostic learning of a concept class over the uniform
distribution. Roughly speaking, if all functions in a class are far from
being -resilient then can be learned agnostically in time and
conversely, if contains a function close to being -resilient then
agnostic learning of in the statistical query (SQ) framework of Kearns has
complexity of at least . This characterization is based on the
duality between approximation by degree- polynomials and
approximate -resilience that we establish. In particular, it implies that
approximation by low-degree polynomials, known to be sufficient for
agnostic learning over product distributions, is in fact necessary.
Focusing on monotone Boolean functions, we exhibit the existence of
near-optimal -approximately
-resilient monotone functions for all
. Prior to our work, it was conceivable even that every monotone
function is -far from any -resilient function. Furthermore, we
construct simple, explicit monotone functions based on and that are close to highly resilient functions. Our constructions are
based on a fairly general resilience analysis and amplification. These
structural results, together with the characterization, imply nearly optimal
lower bounds for agnostic learning of monotone juntas
- …