4,714 research outputs found

    Optimal Quantum Sample Complexity of Learning Algorithms

    Get PDF
    \newcommand{\eps}{\varepsilon} In learning theory, the VC dimension of a concept class CC is the most common way to measure its "richness." In the PAC model \Theta\Big(\frac{d}{\eps} + \frac{\log(1/\delta)}{\eps}\Big) examples are necessary and sufficient for a learner to output, with probability 1δ1-\delta, a hypothesis hh that is \eps-close to the target concept cc. In the related agnostic model, where the samples need not come from a cCc\in C, we know that \Theta\Big(\frac{d}{\eps^2} + \frac{\log(1/\delta)}{\eps^2}\Big) examples are necessary and sufficient to output an hypothesis hCh\in C whose error is at most \eps worse than the best concept in CC. Here we analyze quantum sample complexity, where each example is a coherent quantum state. This model was introduced by Bshouty and Jackson, who showed that quantum examples are more powerful than classical examples in some fixed-distribution settings. However, Atici and Servedio, improved by Zhang, showed that in the PAC setting, quantum examples cannot be much more powerful: the required number of quantum examples is \Omega\Big(\frac{d^{1-\eta}}{\eps} + d + \frac{\log(1/\delta)}{\eps}\Big)\mbox{ for all }\eta> 0. Our main result is that quantum and classical sample complexity are in fact equal up to constant factors in both the PAC and agnostic models. We give two approaches. The first is a fairly simple information-theoretic argument that yields the above two classical bounds and yields the same bounds for quantum sample complexity up to a \log(d/\eps) factor. We then give a second approach that avoids the log-factor loss, based on analyzing the behavior of the "Pretty Good Measurement" on the quantum state identification problems that correspond to learning. This shows classical and quantum sample complexity are equal up to constant factors.Comment: 31 pages LaTeX. Arxiv abstract shortened to fit in their 1920-character limit. Version 3: many small changes, no change in result

    Agnostic Learning of Disjunctions on Symmetric Distributions

    Full text link
    We consider the problem of approximating and learning disjunctions (or equivalently, conjunctions) on symmetric distributions over {0,1}n\{0,1\}^n. Symmetric distributions are distributions whose PDF is invariant under any permutation of the variables. We give a simple proof that for every symmetric distribution D\mathcal{D}, there exists a set of nO(log(1/ϵ))n^{O(\log{(1/\epsilon)})} functions S\mathcal{S}, such that for every disjunction cc, there is function pp, expressible as a linear combination of functions in S\mathcal{S}, such that pp ϵ\epsilon-approximates cc in 1\ell_1 distance on D\mathcal{D} or ExD[c(x)p(x)]ϵ\mathbf{E}_{x \sim \mathcal{D}}[ |c(x)-p(x)|] \leq \epsilon. This directly gives an agnostic learning algorithm for disjunctions on symmetric distributions that runs in time nO(log(1/ϵ))n^{O( \log{(1/\epsilon)})}. The best known previous bound is nO(1/ϵ4)n^{O(1/\epsilon^4)} and follows from approximation of the more general class of halfspaces (Wimmer, 2010). We also show that there exists a symmetric distribution D\mathcal{D}, such that the minimum degree of a polynomial that 1/31/3-approximates the disjunction of all nn variables is 1\ell_1 distance on D\mathcal{D} is Ω(n)\Omega( \sqrt{n}). Therefore the learning result above cannot be achieved via 1\ell_1-regression with a polynomial basis used in most other agnostic learning algorithms. Our technique also gives a simple proof that for any product distribution D\mathcal{D} and every disjunction cc, there exists a polynomial pp of degree O(log(1/ϵ))O(\log{(1/\epsilon)}) such that pp ϵ\epsilon-approximates cc in 1\ell_1 distance on D\mathcal{D}. This was first proved by Blais et al. (2008) via a more involved argument

    Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

    Get PDF
    Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on Rn\mathbb{R}^n in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to exp(x0.99)\exp(-|x|^{0.99}). Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only O(log(1/δ))O(\log(1/\delta))-wise independence, for a tail probability of δ\delta. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science.Comment: 22 page

    Approximate resilience, monotonicity, and the complexity of agnostic learning

    Full text link
    A function ff is dd-resilient if all its Fourier coefficients of degree at most dd are zero, i.e., ff is uncorrelated with all low-degree parities. We study the notion of approximate\mathit{approximate} resilience\mathit{resilience} of Boolean functions, where we say that ff is α\alpha-approximately dd-resilient if ff is α\alpha-close to a [1,1][-1,1]-valued dd-resilient function in 1\ell_1 distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class CC over the uniform distribution. Roughly speaking, if all functions in a class CC are far from being dd-resilient then CC can be learned agnostically in time nO(d)n^{O(d)} and conversely, if CC contains a function close to being dd-resilient then agnostic learning of CC in the statistical query (SQ) framework of Kearns has complexity of at least nΩ(d)n^{\Omega(d)}. This characterization is based on the duality between 1\ell_1 approximation by degree-dd polynomials and approximate dd-resilience that we establish. In particular, it implies that 1\ell_1 approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal α\alpha-approximately Ω~(αn)\widetilde{\Omega}(\alpha\sqrt{n})-resilient monotone functions for all α>0\alpha>0. Prior to our work, it was conceivable even that every monotone function is Ω(1)\Omega(1)-far from any 11-resilient function. Furthermore, we construct simple, explicit monotone functions based on Tribes{\sf Tribes} and CycleRun{\sf CycleRun} that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas
    corecore