1,366 research outputs found

    Improved Bounds on Quantum Learning Algorithms

    Full text link
    In this article we give several new results on the complexity of algorithms that learn Boolean functions from quantum queries and quantum examples. Hunziker et al. conjectured that for any class C of Boolean functions, the number of quantum black-box queries which are required to exactly identify an unknown function from C is O(logCγ^C)O(\frac{\log |C|}{\sqrt{{\hat{\gamma}}^{C}}}), where γ^C\hat{\gamma}^{C} is a combinatorial parameter of the class C. We essentially resolve this conjecture in the affirmative by giving a quantum algorithm that, for any class C, identifies any unknown function from C using O(logCloglogCγ^C)O(\frac{\log |C| \log \log |C|}{\sqrt{{\hat{\gamma}}^{C}}}) quantum black-box queries. We consider a range of natural problems intermediate between the exact learning problem (in which the learner must obtain all bits of information about the black-box function) and the usual problem of computing a predicate (in which the learner must obtain only one bit of information about the black-box function). We give positive and negative results on when the quantum and classical query complexities of these intermediate problems are polynomially related to each other. Finally, we improve the known lower bounds on the number of quantum examples (as opposed to quantum black-box queries) required for (ϵ,δ)(\epsilon,\delta)-PAC learning any concept class of Vapnik-Chervonenkis dimension d over the domain {0,1}n\{0,1\}^n from Ω(dn)\Omega(\frac{d}{n}) to Ω(1ϵlog1δ+d+dϵ)\Omega(\frac{1}{\epsilon}\log \frac{1}{\delta}+d+\frac{\sqrt{d}}{\epsilon}). This new lower bound comes closer to matching known upper bounds for classical PAC learning.Comment: Minor corrections. 18 pages. To appear in Quantum Information Processing. Requires: algorithm.sty, algorithmic.sty to buil

    MCMC Learning

    Full text link
    The theory of learning under the uniform distribution is rich and deep, with connections to cryptography, computational complexity, and the analysis of boolean functions to name a few areas. This theory however is very limited due to the fact that the uniform distribution and the corresponding Fourier basis are rarely encountered as a statistical model. A family of distributions that vastly generalizes the uniform distribution on the Boolean cube is that of distributions represented by Markov Random Fields (MRF). Markov Random Fields are one of the main tools for modeling high dimensional data in many areas of statistics and machine learning. In this paper we initiate the investigation of extending central ideas, methods and algorithms from the theory of learning under the uniform distribution to the setup of learning concepts given examples from MRF distributions. In particular, our results establish a novel connection between properties of MCMC sampling of MRFs and learning under the MRF distribution.Comment: 28 pages, 1 figur

    Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

    Full text link
    The paper studies machine learning problems where each example is described using a set of Boolean features and where hypotheses are represented by linear threshold elements. One method of increasing the expressiveness of learned hypotheses in this context is to expand the feature set to include conjunctions of basic features. This can be done explicitly or where possible by using a kernel function. Focusing on the well known Perceptron and Winnow algorithms, the paper demonstrates a tradeoff between the computational efficiency with which the algorithm can be run over the expanded feature space and the generalization ability of the corresponding learning algorithm. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithm over a feature space of exponentially many conjunctions; however we also show that using such kernels, the Perceptron algorithm can provably make an exponential number of mistakes even when learning simple functions. We then consider the question of whether kernel functions can analogously be used to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. Known upper bounds imply that the Winnow algorithm can learn Disjunctive Normal Form (DNF) formulae with a polynomial mistake bound in this setting. However, we prove that it is computationally hard to simulate Winnows behavior for learning DNF over such a feature set. This implies that the kernel functions which correspond to running Winnow for this problem are not efficiently computable, and that there is no general construction that can run Winnow with kernels

    Learning using Local Membership Queries

    Full text link
    We introduce a new model of membership query (MQ) learning, where the learning algorithm is restricted to query points that are \emph{close} to random examples drawn from the underlying distribution. The learning model is intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where the queries are allowed to be arbitrary points). Membership query algorithms are not popular among machine learning practitioners. Apart from the obvious difficulty of adaptively querying labelers, it has also been observed that querying \emph{unnatural} points leads to increased noise from human labelers (Lang and Baum, 1992). This motivates our study of learning algorithms that make queries that are close to examples generated from the data distribution. We restrict our attention to functions defined on the nn-dimensional Boolean hypercube and say that a membership query is local if its Hamming distance from some example in the (random) training data is at most O(log(n))O(\log(n)). We show the following results in this model: (i) The class of sparse polynomials (with coefficients in R) over {0,1}n\{0,1\}^n is polynomial time learnable under a large class of \emph{locally smooth} distributions using O(log(n))O(\log(n))-local queries. This class also includes the class of O(log(n))O(\log(n))-depth decision trees. (ii) The class of polynomial-sized decision trees is polynomial time learnable under product distributions using O(log(n))O(\log(n))-local queries. (iii) The class of polynomial size DNF formulas is learnable under the uniform distribution using O(log(n))O(\log(n))-local queries in time nO(log(log(n)))n^{O(\log(\log(n)))}. (iv) In addition we prove a number of results relating the proposed model to the traditional PAC model and the PAC+MQ model

    Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

    Get PDF
    Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on Rn\mathbb{R}^n in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to exp(x0.99)\exp(-|x|^{0.99}). Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only O(log(1/δ))O(\log(1/\delta))-wise independence, for a tail probability of δ\delta. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science.Comment: 22 page

    What Circuit Classes Can Be Learned with Non-Trivial Savings?

    Get PDF
    Despite decades of intensive research, efficient - or even sub-exponential time - distribution-free PAC learning algorithms are not known for many important Boolean function classes. In this work we suggest a new perspective on these learning problems, inspired by a surge of recent research in complexity theory, in which the goal is to determine whether and how much of a savings over a naive 2^n runtime can be achieved. We establish a range of exploratory results towards this end. In more detail, (1) We first observe that a simple approach building on known uniform-distribution learning results gives non-trivial distribution-free learning algorithms for several well-studied classes including AC0, arbitrary functions of a few linear threshold functions (LTFs), and AC0 augmented with mod_p gates. (2) Next we present an approach, based on the method of random restrictions from circuit complexity, which can be used to obtain several distribution-free learning algorithms that do not appear to be achievable by approach (1) above. The results achieved in this way include learning algorithms with non-trivial savings for LTF-of-AC0 circuits and improved savings for learning parity-of-AC0 circuits. (3) Finally, our third contribution is a generic technique for converting lower bounds proved using Neciporuk\u27s method to learning algorithms with non-trivial savings. This technique, which is the most involved of our three approaches, yields distribution-free learning algorithms for a range of classes where previously even non-trivial uniform-distribution learning algorithms were not known; these classes include full-basis formulas, branching programs, span programs, etc. up to some fixed polynomial size
    corecore