333 research outputs found

    Learning DNF Expressions from Fourier Spectrum

    Full text link
    Since its introduction by Valiant in 1984, PAC learning of DNF expressions remains one of the central problems in learning theory. We consider this problem in the setting where the underlying distribution is uniform, or more generally, a product distribution. Kalai, Samorodnitsky and Teng (2009) showed that in this setting a DNF expression can be efficiently approximated from its "heavy" low-degree Fourier coefficients alone. This is in contrast to previous approaches where boosting was used and thus Fourier coefficients of the target function modified by various distributions were needed. This property is crucial for learning of DNF expressions over smoothed product distributions, a learning model introduced by Kalai et al. (2009) and inspired by the seminal smoothed analysis model of Spielman and Teng (2001). We introduce a new approach to learning (or approximating) a polynomial threshold functions which is based on creating a function with range [-1,1] that approximately agrees with the unknown function on low-degree Fourier coefficients. We then describe conditions under which this is sufficient for learning polynomial threshold functions. Our approach yields a new, simple algorithm for approximating any polynomial-size DNF expression from its "heavy" low-degree Fourier coefficients alone. Our algorithm greatly simplifies the proof of learnability of DNF expressions over smoothed product distributions. We also describe an application of our algorithm to learning monotone DNF expressions over product distributions. Building on the work of Servedio (2001), we give an algorithm that runs in time \poly((s \cdot \log{(s/\eps)})^{\log{(s/\eps)}}, n), where ss is the size of the target DNF expression and \eps is the accuracy. This improves on \poly((s \cdot \log{(ns/\eps)})^{\log{(s/\eps)} \cdot \log{(1/\eps)}}, n) bound of Servedio (2001).Comment: Appears in Conference on Learning Theory (COLT) 201

    Learning Coverage Functions and Private Release of Marginals

    Full text link
    We study the problem of approximating and learning coverage functions. A function c:2[n]R+c: 2^{[n]} \rightarrow \mathbf{R}^{+} is a coverage function, if there exists a universe UU with non-negative weights w(u)w(u) for each uUu \in U and subsets A1,A2,,AnA_1, A_2, \ldots, A_n of UU such that c(S)=uiSAiw(u)c(S) = \sum_{u \in \cup_{i \in S} A_i} w(u). Alternatively, coverage functions can be described as non-negative linear combinations of monotone disjunctions. They are a natural subclass of submodular functions and arise in a number of applications. We give an algorithm that for any γ,δ>0\gamma,\delta>0, given random and uniform examples of an unknown coverage function cc, finds a function hh that approximates cc within factor 1+γ1+\gamma on all but δ\delta-fraction of the points in time poly(n,1/γ,1/δ)poly(n,1/\gamma,1/\delta). This is the first fully-polynomial algorithm for learning an interesting class of functions in the demanding PMAC model of Balcan and Harvey (2011). Our algorithms are based on several new structural properties of coverage functions. Using the results in (Feldman and Kothari, 2014), we also show that coverage functions are learnable agnostically with excess 1\ell_1-error ϵ\epsilon over all product and symmetric distributions in time nlog(1/ϵ)n^{\log(1/\epsilon)}. In contrast, we show that, without assumptions on the distribution, learning coverage functions is at least as hard as learning polynomial-size disjoint DNF formulas, a class of functions for which the best known algorithm runs in time 2O~(n1/3)2^{\tilde{O}(n^{1/3})} (Klivans and Servedio, 2004). As an application of our learning results, we give simple differentially-private algorithms for releasing monotone conjunction counting queries with low average error. In particular, for any knk \leq n, we obtain private release of kk-way marginals with average error αˉ\bar{\alpha} in time nO(log(1/αˉ))n^{O(\log(1/\bar{\alpha}))}

    Learning pseudo-Boolean k-DNF and Submodular Functions

    Full text link
    We prove that any submodular function f: {0,1}^n -> {0,1,...,k} can be represented as a pseudo-Boolean 2k-DNF formula. Pseudo-Boolean DNFs are a natural generalization of DNF representation for functions with integer range. Each term in such a formula has an associated integral constant. We show that an analog of Hastad's switching lemma holds for pseudo-Boolean k-DNFs if all constants associated with the terms of the formula are bounded. This allows us to generalize Mansour's PAC-learning algorithm for k-DNFs to pseudo-Boolean k-DNFs, and hence gives a PAC-learning algorithm with membership queries under the uniform distribution for submodular functions of the form f:{0,1}^n -> {0,1,...,k}. Our algorithm runs in time polynomial in n, k^{O(k \log k / \epsilon)}, 1/\epsilon and log(1/\delta) and works even in the agnostic setting. The line of previous work on learning submodular functions [Balcan, Harvey (STOC '11), Gupta, Hardt, Roth, Ullman (STOC '11), Cheraghchi, Klivans, Kothari, Lee (SODA '12)] implies only n^{O(k)} query complexity for learning submodular functions in this setting, for fixed epsilon and delta. Our learning algorithm implies a property tester for submodularity of functions f:{0,1}^n -> {0, ..., k} with query complexity polynomial in n for k=O((\log n/ \loglog n)^{1/2}) and constant proximity parameter \epsilon

    Approximate resilience, monotonicity, and the complexity of agnostic learning

    Full text link
    A function ff is dd-resilient if all its Fourier coefficients of degree at most dd are zero, i.e., ff is uncorrelated with all low-degree parities. We study the notion of approximate\mathit{approximate} resilience\mathit{resilience} of Boolean functions, where we say that ff is α\alpha-approximately dd-resilient if ff is α\alpha-close to a [1,1][-1,1]-valued dd-resilient function in 1\ell_1 distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class CC over the uniform distribution. Roughly speaking, if all functions in a class CC are far from being dd-resilient then CC can be learned agnostically in time nO(d)n^{O(d)} and conversely, if CC contains a function close to being dd-resilient then agnostic learning of CC in the statistical query (SQ) framework of Kearns has complexity of at least nΩ(d)n^{\Omega(d)}. This characterization is based on the duality between 1\ell_1 approximation by degree-dd polynomials and approximate dd-resilience that we establish. In particular, it implies that 1\ell_1 approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal α\alpha-approximately Ω~(αn)\widetilde{\Omega}(\alpha\sqrt{n})-resilient monotone functions for all α>0\alpha>0. Prior to our work, it was conceivable even that every monotone function is Ω(1)\Omega(1)-far from any 11-resilient function. Furthermore, we construct simple, explicit monotone functions based on Tribes{\sf Tribes} and CycleRun{\sf CycleRun} that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas

    Agnostic Learning of Disjunctions on Symmetric Distributions

    Full text link
    We consider the problem of approximating and learning disjunctions (or equivalently, conjunctions) on symmetric distributions over {0,1}n\{0,1\}^n. Symmetric distributions are distributions whose PDF is invariant under any permutation of the variables. We give a simple proof that for every symmetric distribution D\mathcal{D}, there exists a set of nO(log(1/ϵ))n^{O(\log{(1/\epsilon)})} functions S\mathcal{S}, such that for every disjunction cc, there is function pp, expressible as a linear combination of functions in S\mathcal{S}, such that pp ϵ\epsilon-approximates cc in 1\ell_1 distance on D\mathcal{D} or ExD[c(x)p(x)]ϵ\mathbf{E}_{x \sim \mathcal{D}}[ |c(x)-p(x)|] \leq \epsilon. This directly gives an agnostic learning algorithm for disjunctions on symmetric distributions that runs in time nO(log(1/ϵ))n^{O( \log{(1/\epsilon)})}. The best known previous bound is nO(1/ϵ4)n^{O(1/\epsilon^4)} and follows from approximation of the more general class of halfspaces (Wimmer, 2010). We also show that there exists a symmetric distribution D\mathcal{D}, such that the minimum degree of a polynomial that 1/31/3-approximates the disjunction of all nn variables is 1\ell_1 distance on D\mathcal{D} is Ω(n)\Omega( \sqrt{n}). Therefore the learning result above cannot be achieved via 1\ell_1-regression with a polynomial basis used in most other agnostic learning algorithms. Our technique also gives a simple proof that for any product distribution D\mathcal{D} and every disjunction cc, there exists a polynomial pp of degree O(log(1/ϵ))O(\log{(1/\epsilon)}) such that pp ϵ\epsilon-approximates cc in 1\ell_1 distance on D\mathcal{D}. This was first proved by Blais et al. (2008) via a more involved argument

    A composition theorem for the Fourier Entropy-Influence conjecture

    Full text link
    The Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai [FK96] seeks to relate two fundamental measures of Boolean function complexity: it states that H[f]CInf[f]H[f] \leq C Inf[f] holds for every Boolean function ff, where H[f]H[f] denotes the spectral entropy of ff, Inf[f]Inf[f] is its total influence, and C>0C > 0 is a universal constant. Despite significant interest in the conjecture it has only been shown to hold for a few classes of Boolean functions. Our main result is a composition theorem for the FEI conjecture. We show that if g1,...,gkg_1,...,g_k are functions over disjoint sets of variables satisfying the conjecture, and if the Fourier transform of FF taken with respect to the product distribution with biases E[g1],...,E[gk]E[g_1],...,E[g_k] satisfies the conjecture, then their composition F(g1(x1),...,gk(xk))F(g_1(x^1),...,g_k(x^k)) satisfies the conjecture. As an application we show that the FEI conjecture holds for read-once formulas over arbitrary gates of bounded arity, extending a recent result [OWZ11] which proved it for read-once decision trees. Our techniques also yield an explicit function with the largest known ratio of C6.278C \geq 6.278 between H[f]H[f] and Inf[f]Inf[f], improving on the previous lower bound of 4.615

    Learning using Local Membership Queries

    Full text link
    We introduce a new model of membership query (MQ) learning, where the learning algorithm is restricted to query points that are \emph{close} to random examples drawn from the underlying distribution. The learning model is intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where the queries are allowed to be arbitrary points). Membership query algorithms are not popular among machine learning practitioners. Apart from the obvious difficulty of adaptively querying labelers, it has also been observed that querying \emph{unnatural} points leads to increased noise from human labelers (Lang and Baum, 1992). This motivates our study of learning algorithms that make queries that are close to examples generated from the data distribution. We restrict our attention to functions defined on the nn-dimensional Boolean hypercube and say that a membership query is local if its Hamming distance from some example in the (random) training data is at most O(log(n))O(\log(n)). We show the following results in this model: (i) The class of sparse polynomials (with coefficients in R) over {0,1}n\{0,1\}^n is polynomial time learnable under a large class of \emph{locally smooth} distributions using O(log(n))O(\log(n))-local queries. This class also includes the class of O(log(n))O(\log(n))-depth decision trees. (ii) The class of polynomial-sized decision trees is polynomial time learnable under product distributions using O(log(n))O(\log(n))-local queries. (iii) The class of polynomial size DNF formulas is learnable under the uniform distribution using O(log(n))O(\log(n))-local queries in time nO(log(log(n)))n^{O(\log(\log(n)))}. (iv) In addition we prove a number of results relating the proposed model to the traditional PAC model and the PAC+MQ model

    Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas

    Full text link
    We investigate the approximability of several classes of real-valued functions by functions of a small number of variables ({\em juntas}). Our main results are tight bounds on the number of variables required to approximate a function f:{0,1}n[0,1]f:\{0,1\}^n \rightarrow [0,1] within 2\ell_2-error ϵ\epsilon over the uniform distribution: 1. If ff is submodular, then it is ϵ\epsilon-close to a function of O(1ϵ2log1ϵ)O(\frac{1}{\epsilon^2} \log \frac{1}{\epsilon}) variables. This is an exponential improvement over previously known results. We note that Ω(1ϵ2)\Omega(\frac{1}{\epsilon^2}) variables are necessary even for linear functions. 2. If ff is fractionally subadditive (XOS) it is ϵ\epsilon-close to a function of 2O(1/ϵ2)2^{O(1/\epsilon^2)} variables. This result holds for all functions with low total 1\ell_1-influence and is a real-valued analogue of Friedgut's theorem for boolean functions. We show that 2Ω(1/ϵ)2^{\Omega(1/\epsilon)} variables are necessary even for XOS functions. As applications of these results, we provide learning algorithms over the uniform distribution. For XOS functions, we give a PAC learning algorithm that runs in time 2poly(1/ϵ)poly(n)2^{poly(1/\epsilon)} poly(n). For submodular functions we give an algorithm in the more demanding PMAC learning model (Balcan and Harvey, 2011) which requires a multiplicative 1+γ1+\gamma factor approximation with probability at least 1ϵ1-\epsilon over the target distribution. Our uniform distribution algorithm runs in time 2poly(1/(γϵ))poly(n)2^{poly(1/(\gamma\epsilon))} poly(n). This is the first algorithm in the PMAC model that over the uniform distribution can achieve a constant approximation factor arbitrarily close to 1 for all submodular functions. As follows from the lower bounds in (Feldman et al., 2013) both of these algorithms are close to optimal. We also give applications for proper learning, testing and agnostic learning with value queries of these classes.Comment: Extended abstract appears in proceedings of FOCS 201
    corecore