104 research outputs found

    Noise-Tolerant Learning, the Parity Problem, and the Statistical Query Model

    Full text link
    We describe a slightly sub-exponential time algorithm for learning parity functions in the presence of random classification noise. This results in a polynomial-time algorithm for the case of parity functions that depend on only the first O(log n log log n) bits of input. This is the first known instance of an efficient noise-tolerant algorithm for a concept class that is provably not learnable in the Statistical Query model of Kearns. Thus, we demonstrate that the set of problems learnable in the statistical query model is a strict subset of those problems learnable in the presence of noise in the PAC model. In coding-theory terms, what we give is a poly(n)-time algorithm for decoding linear k by n codes in the presence of random noise for the case of k = c log n loglog n for some c > 0. (The case of k = O(log n) is trivial since one can just individually check each of the 2^k possible messages and choose the one that yields the closest codeword.) A natural extension of the statistical query model is to allow queries about statistical properties that involve t-tuples of examples (as opposed to single examples). The second result of this paper is to show that any class of functions learnable (strongly or weakly) with t-wise queries for t = O(log n) is also weakly learnable with standard unary queries. Hence this natural extension to the statistical query model does not increase the set of weakly learnable functions

    Learning to Reason: Leveraging Neural Networks for Approximate DNF Counting

    Full text link
    Weighted model counting (WMC) has emerged as a prevalent approach for probabilistic inference. In its most general form, WMC is #P-hard. Weighted DNF counting (weighted #DNF) is a special case, where approximations with probabilistic guarantees are obtained in O(nm), where n denotes the number of variables, and m the number of clauses of the input DNF, but this is not scalable in practice. In this paper, we propose a neural model counting approach for weighted #DNF that combines approximate model counting with deep learning, and accurately approximates model counts in linear time when width is bounded. We conduct experiments to validate our method, and show that our model learns and generalizes very well to large-scale #DNF instances.Comment: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). Code and data available at: https://github.com/ralphabb/NeuralDNF

    Pseudorandomness and Fourier Growth Bounds for Width-3 Branching Programs

    Get PDF
    We present an explicit pseudorandom generator for oblivious, read-once, width-3 branching programs, which can read their input bits in any order. The generator has seed length O~( log^3 n ). The previously best known seed length for this model is n^{1/2+o(1)} due to Impagliazzo, Meka, and Zuckerman (FOCS\u2712). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RANDOM\u2713) for permutation branching programs. The main technical novelty underlying our generator is a new bound on the Fourier growth of width-3, oblivious, read-once branching programs. Specifically, we show that for any f : {0,1}^n -> {0,1} computed by such a branching program, and k in [n], sum_{|s|=k} |hat{f}(s)| < n^2 * (O(log n))^k, where f(x) = sum_s hat{f}(s) (-1)^ is the standard Fourier transform over Z_2^n. The base O(log n) of the Fourier growth is tight up to a factor of log log n

    From average case complexity to improper learning complexity

    Full text link
    The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are efficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms. The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a.k.a. representation independent learning).The difficulty in proving lower bounds for improper learning is that the standard reductions from NP\mathbf{NP}-hard problems do not seem to apply in this context. There is essentially only one known approach to proving lower bounds on improper learning. It was initiated in (Kearns and Valiant 89) and relies on cryptographic assumptions. We introduce a new technique for proving hardness of improper learning, based on reductions from problems that are hard on average. We put forward a (fairly strong) generalization of Feige's assumption (Feige 02) about the complexity of refuting random constraint satisfaction problems. Combining this assumption with our new technique yields far reaching implications. In particular, 1. Learning DNF\mathrm{DNF}'s is hard. 2. Agnostically learning halfspaces with a constant approximation ratio is hard. 3. Learning an intersection of ω(1)\omega(1) halfspaces is hard.Comment: 34 page

    Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas

    Full text link
    We investigate the approximability of several classes of real-valued functions by functions of a small number of variables ({\em juntas}). Our main results are tight bounds on the number of variables required to approximate a function f:{0,1}n[0,1]f:\{0,1\}^n \rightarrow [0,1] within 2\ell_2-error ϵ\epsilon over the uniform distribution: 1. If ff is submodular, then it is ϵ\epsilon-close to a function of O(1ϵ2log1ϵ)O(\frac{1}{\epsilon^2} \log \frac{1}{\epsilon}) variables. This is an exponential improvement over previously known results. We note that Ω(1ϵ2)\Omega(\frac{1}{\epsilon^2}) variables are necessary even for linear functions. 2. If ff is fractionally subadditive (XOS) it is ϵ\epsilon-close to a function of 2O(1/ϵ2)2^{O(1/\epsilon^2)} variables. This result holds for all functions with low total 1\ell_1-influence and is a real-valued analogue of Friedgut's theorem for boolean functions. We show that 2Ω(1/ϵ)2^{\Omega(1/\epsilon)} variables are necessary even for XOS functions. As applications of these results, we provide learning algorithms over the uniform distribution. For XOS functions, we give a PAC learning algorithm that runs in time 2poly(1/ϵ)poly(n)2^{poly(1/\epsilon)} poly(n). For submodular functions we give an algorithm in the more demanding PMAC learning model (Balcan and Harvey, 2011) which requires a multiplicative 1+γ1+\gamma factor approximation with probability at least 1ϵ1-\epsilon over the target distribution. Our uniform distribution algorithm runs in time 2poly(1/(γϵ))poly(n)2^{poly(1/(\gamma\epsilon))} poly(n). This is the first algorithm in the PMAC model that over the uniform distribution can achieve a constant approximation factor arbitrarily close to 1 for all submodular functions. As follows from the lower bounds in (Feldman et al., 2013) both of these algorithms are close to optimal. We also give applications for proper learning, testing and agnostic learning with value queries of these classes.Comment: Extended abstract appears in proceedings of FOCS 201

    Conspiracies between learning algorithms, circuit lower bounds, and pseudorandomness

    Get PDF
    We prove several results giving new and stronger connections between learning theory, circuit complexity and pseudorandomness. Let C be any typical class of Boolean circuits, and C[s(n)] denote n-variable C-circuits of size ≤ s(n). We show: Learning Speedups. If C[poly(n)] admits a randomized weak learning algorithm under the uniform distribution with membership queries that runs in time 2n/nω(1), then for every k ≥ 1 and ε > 0 the class C[n k ] can be learned to high accuracy in time O(2n ε ). There is ε > 0 such that C[2n ε ] can be learned in time 2n/nω(1) if and only if C[poly(n)] can be learned in time 2(log n) O(1) . Equivalences between Learning Models. We use learning speedups to obtain equivalences between various randomized learning and compression models, including sub-exponential time learning with membership queries, sub-exponential time learning with membership and equivalence queries, probabilistic function compression and probabilistic average-case function compression. A Dichotomy between Learnability and Pseudorandomness. In the non-uniform setting, there is non-trivial learning for C[poly(n)] if and only if there are no exponentially secure pseudorandom functions computable in C[poly(n)]. Lower Bounds from Nontrivial Learning. If for each k ≥ 1, (depth-d)-C[n k ] admits a randomized weak learning algorithm with membership queries under the uniform distribution that runs in time 2n/nω(1), then for each k ≥ 1, BPE * (depth-d)-C[n k ]. If for some ε > 0 there are P-natural proofs useful against C[2n ε ], then ZPEXP * C[poly(n)]. Karp-Lipton Theorems for Probabilistic Classes. If there is a k > 0 such that BPE ⊆ i.o.Circuit[n k ], then BPEXP ⊆ i.o.EXP/O(log n). If ZPEXP ⊆ i.o.Circuit[2n/3 ], then ZPEXP ⊆ i.o.ESUBEXP. Hardness Results for MCSP. All functions in non-uniform NC1 reduce to the Minimum Circuit Size Problem via truth-table reductions computable by TC0 circuits. In particular, if MCSP ∈ TC0 then NC1 = TC0

    Conspiracies Between Learning Algorithms, Circuit Lower Bounds, and Pseudorandomness

    Get PDF
    We prove several results giving new and stronger connections between learning theory, circuit complexity and pseudorandomness. Let C be any typical class of Boolean circuits, and C[s(n)] denote n-variable C-circuits of size <= s(n). We show: Learning Speedups: If C[s(n)] admits a randomized weak learning algorithm under the uniform distribution with membership queries that runs in time 2^n/n^{omega(1)}, then for every k >= 1 and epsilon > 0 the class C[n^k] can be learned to high accuracy in time O(2^{n^epsilon}). There is epsilon > 0 such that C[2^{n^{epsilon}}] can be learned in time 2^n/n^{omega(1)} if and only if C[poly(n)] can be learned in time 2^{(log(n))^{O(1)}}. Equivalences between Learning Models: We use learning speedups to obtain equivalences between various randomized learning and compression models, including sub-exponential time learning with membership queries, sub-exponential time learning with membership and equivalence queries, probabilistic function compression and probabilistic average-case function compression. A Dichotomy between Learnability and Pseudorandomness: In the non-uniform setting, there is non-trivial learning for C[poly(n)] if and only if there are no exponentially secure pseudorandom functions computable in C[poly(n)]. Lower Bounds from Nontrivial Learning: If for each k >= 1, (depth-d)-C[n^k] admits a randomized weak learning algorithm with membership queries under the uniform distribution that runs in time 2^n/n^{omega(1)}, then for each k >= 1, BPE is not contained in (depth-d)-C[n^k]. If for some epsilon > 0 there are P-natural proofs useful against C[2^{n^{epsilon}}], then ZPEXP is not contained in C[poly(n)]. Karp-Lipton Theorems for Probabilistic Classes: If there is a k > 0 such that BPE is contained in i.o.Circuit[n^k], then BPEXP is contained in i.o.EXP/O(log(n)). If ZPEXP is contained in i.o.Circuit[2^{n/3}], then ZPEXP is contained in i.o.ESUBEXP. Hardness Results for MCSP: All functions in non-uniform NC^1 reduce to the Minimum Circuit Size Problem via truth-table reductions computable by TC^0 circuits. In particular, if MCSP is in TC^0 then NC^1 = TC^0
    corecore