31,583 research outputs found

    MCMC Learning

    Full text link
    The theory of learning under the uniform distribution is rich and deep, with connections to cryptography, computational complexity, and the analysis of boolean functions to name a few areas. This theory however is very limited due to the fact that the uniform distribution and the corresponding Fourier basis are rarely encountered as a statistical model. A family of distributions that vastly generalizes the uniform distribution on the Boolean cube is that of distributions represented by Markov Random Fields (MRF). Markov Random Fields are one of the main tools for modeling high dimensional data in many areas of statistics and machine learning. In this paper we initiate the investigation of extending central ideas, methods and algorithms from the theory of learning under the uniform distribution to the setup of learning concepts given examples from MRF distributions. In particular, our results establish a novel connection between properties of MCMC sampling of MRFs and learning under the MRF distribution.Comment: 28 pages, 1 figur

    Learning pseudo-Boolean k-DNF and Submodular Functions

    Full text link
    We prove that any submodular function f: {0,1}^n -> {0,1,...,k} can be represented as a pseudo-Boolean 2k-DNF formula. Pseudo-Boolean DNFs are a natural generalization of DNF representation for functions with integer range. Each term in such a formula has an associated integral constant. We show that an analog of Hastad's switching lemma holds for pseudo-Boolean k-DNFs if all constants associated with the terms of the formula are bounded. This allows us to generalize Mansour's PAC-learning algorithm for k-DNFs to pseudo-Boolean k-DNFs, and hence gives a PAC-learning algorithm with membership queries under the uniform distribution for submodular functions of the form f:{0,1}^n -> {0,1,...,k}. Our algorithm runs in time polynomial in n, k^{O(k \log k / \epsilon)}, 1/\epsilon and log(1/\delta) and works even in the agnostic setting. The line of previous work on learning submodular functions [Balcan, Harvey (STOC '11), Gupta, Hardt, Roth, Ullman (STOC '11), Cheraghchi, Klivans, Kothari, Lee (SODA '12)] implies only n^{O(k)} query complexity for learning submodular functions in this setting, for fixed epsilon and delta. Our learning algorithm implies a property tester for submodularity of functions f:{0,1}^n -> {0, ..., k} with query complexity polynomial in n for k=O((\log n/ \loglog n)^{1/2}) and constant proximity parameter \epsilon

    Improved Learning from Kolmogorov Complexity

    Get PDF
    Carmosino, Impagliazzo, Kabanets, and Kolokolova (CCC, 2016) showed that the existence of natural properties in the sense of Razborov and Rudich (JCSS, 1997) implies PAC learning algorithms in the sense of Valiant (Comm. ACM, 1984), for boolean functions in P/poly, under the uniform distribution and with membership queries. It is still an open problem to get from natural properties learning algorithms that do not rely on membership queries but rather use randomly drawn labeled examples. Natural properties may be understood as an average-case version of MCSP, the problem of deciding the minimum size of a circuit computing a given truth-table. Problems related to MCSP include those concerning time-bounded Kolmogorov complexity. MKTP, for example, asks for the KT-complexity of a given string. KT-complexity is a relaxation of circuit size, as it does away with the requirement that a short description of a string be interpreted as a boolean circuit. In this work, under assumptions of MKTP and the related problem MK^tP being easy on average, we get learning algorithms for boolean functions in P/poly that - work over any distribution D samplable by a family of polynomial-size circuits (given explicitly in the case of MKTP), - only use randomly drawn labeled examples from D, and - are agnostic (do not require the target function to belong to the hypothesis class). Our results build upon the recent work of Hirahara and Nanashima (FOCS, 2021) who showed similar learning consequences but under a stronger assumption that NP is easy on average

    Conspiracies between learning algorithms, circuit lower bounds, and pseudorandomness

    Get PDF
    We prove several results giving new and stronger connections between learning theory, circuit complexity and pseudorandomness. Let C be any typical class of Boolean circuits, and C[s(n)] denote n-variable C-circuits of size ≤ s(n). We show: Learning Speedups. If C[poly(n)] admits a randomized weak learning algorithm under the uniform distribution with membership queries that runs in time 2n/nω(1), then for every k ≥ 1 and ε > 0 the class C[n k ] can be learned to high accuracy in time O(2n ε ). There is ε > 0 such that C[2n ε ] can be learned in time 2n/nω(1) if and only if C[poly(n)] can be learned in time 2(log n) O(1) . Equivalences between Learning Models. We use learning speedups to obtain equivalences between various randomized learning and compression models, including sub-exponential time learning with membership queries, sub-exponential time learning with membership and equivalence queries, probabilistic function compression and probabilistic average-case function compression. A Dichotomy between Learnability and Pseudorandomness. In the non-uniform setting, there is non-trivial learning for C[poly(n)] if and only if there are no exponentially secure pseudorandom functions computable in C[poly(n)]. Lower Bounds from Nontrivial Learning. If for each k ≥ 1, (depth-d)-C[n k ] admits a randomized weak learning algorithm with membership queries under the uniform distribution that runs in time 2n/nω(1), then for each k ≥ 1, BPE * (depth-d)-C[n k ]. If for some ε > 0 there are P-natural proofs useful against C[2n ε ], then ZPEXP * C[poly(n)]. Karp-Lipton Theorems for Probabilistic Classes. If there is a k > 0 such that BPE ⊆ i.o.Circuit[n k ], then BPEXP ⊆ i.o.EXP/O(log n). If ZPEXP ⊆ i.o.Circuit[2n/3 ], then ZPEXP ⊆ i.o.ESUBEXP. Hardness Results for MCSP. All functions in non-uniform NC1 reduce to the Minimum Circuit Size Problem via truth-table reductions computable by TC0 circuits. In particular, if MCSP ∈ TC0 then NC1 = TC0

    Learning using Local Membership Queries

    Full text link
    We introduce a new model of membership query (MQ) learning, where the learning algorithm is restricted to query points that are \emph{close} to random examples drawn from the underlying distribution. The learning model is intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where the queries are allowed to be arbitrary points). Membership query algorithms are not popular among machine learning practitioners. Apart from the obvious difficulty of adaptively querying labelers, it has also been observed that querying \emph{unnatural} points leads to increased noise from human labelers (Lang and Baum, 1992). This motivates our study of learning algorithms that make queries that are close to examples generated from the data distribution. We restrict our attention to functions defined on the nn-dimensional Boolean hypercube and say that a membership query is local if its Hamming distance from some example in the (random) training data is at most O(log(n))O(\log(n)). We show the following results in this model: (i) The class of sparse polynomials (with coefficients in R) over {0,1}n\{0,1\}^n is polynomial time learnable under a large class of \emph{locally smooth} distributions using O(log(n))O(\log(n))-local queries. This class also includes the class of O(log(n))O(\log(n))-depth decision trees. (ii) The class of polynomial-sized decision trees is polynomial time learnable under product distributions using O(log(n))O(\log(n))-local queries. (iii) The class of polynomial size DNF formulas is learnable under the uniform distribution using O(log(n))O(\log(n))-local queries in time nO(log(log(n)))n^{O(\log(\log(n)))}. (iv) In addition we prove a number of results relating the proposed model to the traditional PAC model and the PAC+MQ model

    Conspiracies Between Learning Algorithms, Circuit Lower Bounds, and Pseudorandomness

    Get PDF
    We prove several results giving new and stronger connections between learning theory, circuit complexity and pseudorandomness. Let C be any typical class of Boolean circuits, and C[s(n)] denote n-variable C-circuits of size <= s(n). We show: Learning Speedups: If C[s(n)] admits a randomized weak learning algorithm under the uniform distribution with membership queries that runs in time 2^n/n^{omega(1)}, then for every k >= 1 and epsilon > 0 the class C[n^k] can be learned to high accuracy in time O(2^{n^epsilon}). There is epsilon > 0 such that C[2^{n^{epsilon}}] can be learned in time 2^n/n^{omega(1)} if and only if C[poly(n)] can be learned in time 2^{(log(n))^{O(1)}}. Equivalences between Learning Models: We use learning speedups to obtain equivalences between various randomized learning and compression models, including sub-exponential time learning with membership queries, sub-exponential time learning with membership and equivalence queries, probabilistic function compression and probabilistic average-case function compression. A Dichotomy between Learnability and Pseudorandomness: In the non-uniform setting, there is non-trivial learning for C[poly(n)] if and only if there are no exponentially secure pseudorandom functions computable in C[poly(n)]. Lower Bounds from Nontrivial Learning: If for each k >= 1, (depth-d)-C[n^k] admits a randomized weak learning algorithm with membership queries under the uniform distribution that runs in time 2^n/n^{omega(1)}, then for each k >= 1, BPE is not contained in (depth-d)-C[n^k]. If for some epsilon > 0 there are P-natural proofs useful against C[2^{n^{epsilon}}], then ZPEXP is not contained in C[poly(n)]. Karp-Lipton Theorems for Probabilistic Classes: If there is a k > 0 such that BPE is contained in i.o.Circuit[n^k], then BPEXP is contained in i.o.EXP/O(log(n)). If ZPEXP is contained in i.o.Circuit[2^{n/3}], then ZPEXP is contained in i.o.ESUBEXP. Hardness Results for MCSP: All functions in non-uniform NC^1 reduce to the Minimum Circuit Size Problem via truth-table reductions computable by TC^0 circuits. In particular, if MCSP is in TC^0 then NC^1 = TC^0

    Learning circuits with few negations

    Get PDF
    Monotone Boolean functions, and the monotone Boolean circuits that compute them, have been intensively studied in complexity theory. In this paper we study the structure of Boolean functions in terms of the minimum number of negations in any circuit computing them, a complexity measure that interpolates between monotone functions and the class of all functions. We study this generalization of monotonicity from the vantage point of learning theory, giving near-matching upper and lower bounds on the uniform-distribution learnability of circuits in terms of the number of negations they contain. Our upper bounds are based on a new structural characterization of negation-limited circuits that extends a classical result of A. A. Markov. Our lower bounds, which employ Fourier-analytic tools from hardness amplification, give new results even for circuits with no negations (i.e. monotone functions)
    corecore