    Noise-Tolerant Learning, the Parity Problem, and the Statistical Query Model

    We describe a slightly sub-exponential time algorithm for learning parity functions in the presence of random classification noise. This results in a polynomial-time algorithm for the case of parity functions that depend on only the first O(log n log log n) bits of input. This is the first known instance of an efficient noise-tolerant algorithm for a concept class that is provably not learnable in the Statistical Query model of Kearns. Thus, we demonstrate that the set of problems learnable in the statistical query model is a strict subset of those problems learnable in the presence of noise in the PAC model. In coding-theory terms, what we give is a poly(n)-time algorithm for decoding linear k by n codes in the presence of random noise for the case of k = c log n loglog n for some c > 0. (The case of k = O(log n) is trivial since one can just individually check each of the 2^k possible messages and choose the one that yields the closest codeword.) A natural extension of the statistical query model is to allow queries about statistical properties that involve t-tuples of examples (as opposed to single examples). The second result of this paper is to show that any class of functions learnable (strongly or weakly) with t-wise queries for t = O(log n) is also weakly learnable with standard unary queries. Hence this natural extension to the statistical query model does not increase the set of weakly learnable functions

    Two new results about quantum exact learning

    We present two new results about exact learning by quantum computers. First, we show how to exactly learn a kk-Fourier-sparse nn-bit Boolean function from O(k1.5(logk)2)O(k^{1.5}(\log k)^2) uniform quantum examples for that function. This improves over the bound of Θ~(kn)\widetilde{\Theta}(kn) uniformly random classical examples (Haviv and Regev, CCC'15). Our main tool is an improvement of Chang's lemma for the special case of sparse functions. Second, we show that if a concept class C\mathcal{C} can be exactly learned using QQ quantum membership queries, then it can also be learned using O(Q2logQlogC)O\left(\frac{Q^2}{\log Q}\log|\mathcal{C}|\right) classical membership queries. This improves the previous-best simulation result (Servedio and Gortler, SICOMP'04) by a logQ\log Q-factor.Comment: v3: 21 pages. Small corrections and clarification

    Learning to Reason: Leveraging Neural Networks for Approximate DNF Counting

    Weighted model counting (WMC) has emerged as a prevalent approach for probabilistic inference. In its most general form, WMC is #P-hard. Weighted DNF counting (weighted #DNF) is a special case, where approximations with probabilistic guarantees are obtained in O(nm), where n denotes the number of variables, and m the number of clauses of the input DNF, but this is not scalable in practice. In this paper, we propose a neural model counting approach for weighted #DNF that combines approximate model counting with deep learning, and accurately approximates model counts in linear time when width is bounded. We conduct experiments to validate our method, and show that our model learns and generalizes very well to large-scale #DNF instances.Comment: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). Code and data available at: https://github.com/ralphabb/NeuralDNF

    Pseudorandomness and Fourier Growth Bounds for Width-3 Branching Programs

    We present an explicit pseudorandom generator for oblivious, read-once, width-3 branching programs, which can read their input bits in any order. The generator has seed length O~( log^3 n ). The previously best known seed length for this model is n^{1/2+o(1)} due to Impagliazzo, Meka, and Zuckerman (FOCS\u2712). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RANDOM\u2713) for permutation branching programs. The main technical novelty underlying our generator is a new bound on the Fourier growth of width-3, oblivious, read-once branching programs. Specifically, we show that for any f : {0,1}^n -> {0,1} computed by such a branching program, and k in [n], sum_{|s|=k} |hat{f}(s)| < n^2 * (O(log n))^k, where f(x) = sum_s hat{f}(s) (-1)^ is the standard Fourier transform over Z_2^n. The base O(log n) of the Fourier growth is tight up to a factor of log log n

    Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas

    We investigate the approximability of several classes of real-valued functions by functions of a small number of variables ({\em juntas}). Our main results are tight bounds on the number of variables required to approximate a function f:{0,1}n[0,1]f:\{0,1\}^n \rightarrow [0,1] within 2\ell_2-error ϵ\epsilon over the uniform distribution: 1. If ff is submodular, then it is ϵ\epsilon-close to a function of O(1ϵ2log1ϵ)O(\frac{1}{\epsilon^2} \log \frac{1}{\epsilon}) variables. This is an exponential improvement over previously known results. We note that Ω(1ϵ2)\Omega(\frac{1}{\epsilon^2}) variables are necessary even for linear functions. 2. If ff is fractionally subadditive (XOS) it is ϵ\epsilon-close to a function of 2O(1/ϵ2)2^{O(1/\epsilon^2)} variables. This result holds for all functions with low total 1\ell_1-influence and is a real-valued analogue of Friedgut's theorem for boolean functions. We show that 2Ω(1/ϵ)2^{\Omega(1/\epsilon)} variables are necessary even for XOS functions. As applications of these results, we provide learning algorithms over the uniform distribution. For XOS functions, we give a PAC learning algorithm that runs in time 2poly(1/ϵ)poly(n)2^{poly(1/\epsilon)} poly(n). For submodular functions we give an algorithm in the more demanding PMAC learning model (Balcan and Harvey, 2011) which requires a multiplicative 1+γ1+\gamma factor approximation with probability at least 1ϵ1-\epsilon over the target distribution. Our uniform distribution algorithm runs in time 2poly(1/(γϵ))poly(n)2^{poly(1/(\gamma\epsilon))} poly(n). This is the first algorithm in the PMAC model that over the uniform distribution can achieve a constant approximation factor arbitrarily close to 1 for all submodular functions. As follows from the lower bounds in (Feldman et al., 2013) both of these algorithms are close to optimal. We also give applications for proper learning, testing and agnostic learning with value queries of these classes.Comment: Extended abstract appears in proceedings of FOCS 201