63 research outputs found

    Learning DNF Expressions from Fourier Spectrum

    Full text link
    Since its introduction by Valiant in 1984, PAC learning of DNF expressions remains one of the central problems in learning theory. We consider this problem in the setting where the underlying distribution is uniform, or more generally, a product distribution. Kalai, Samorodnitsky and Teng (2009) showed that in this setting a DNF expression can be efficiently approximated from its "heavy" low-degree Fourier coefficients alone. This is in contrast to previous approaches where boosting was used and thus Fourier coefficients of the target function modified by various distributions were needed. This property is crucial for learning of DNF expressions over smoothed product distributions, a learning model introduced by Kalai et al. (2009) and inspired by the seminal smoothed analysis model of Spielman and Teng (2001). We introduce a new approach to learning (or approximating) a polynomial threshold functions which is based on creating a function with range [-1,1] that approximately agrees with the unknown function on low-degree Fourier coefficients. We then describe conditions under which this is sufficient for learning polynomial threshold functions. Our approach yields a new, simple algorithm for approximating any polynomial-size DNF expression from its "heavy" low-degree Fourier coefficients alone. Our algorithm greatly simplifies the proof of learnability of DNF expressions over smoothed product distributions. We also describe an application of our algorithm to learning monotone DNF expressions over product distributions. Building on the work of Servedio (2001), we give an algorithm that runs in time \poly((s \cdot \log{(s/\eps)})^{\log{(s/\eps)}}, n), where ss is the size of the target DNF expression and \eps is the accuracy. This improves on \poly((s \cdot \log{(ns/\eps)})^{\log{(s/\eps)} \cdot \log{(1/\eps)}}, n) bound of Servedio (2001).Comment: Appears in Conference on Learning Theory (COLT) 201

    Learning Coverage Functions and Private Release of Marginals

    Full text link
    We study the problem of approximating and learning coverage functions. A function c:2[n]R+c: 2^{[n]} \rightarrow \mathbf{R}^{+} is a coverage function, if there exists a universe UU with non-negative weights w(u)w(u) for each uUu \in U and subsets A1,A2,,AnA_1, A_2, \ldots, A_n of UU such that c(S)=uiSAiw(u)c(S) = \sum_{u \in \cup_{i \in S} A_i} w(u). Alternatively, coverage functions can be described as non-negative linear combinations of monotone disjunctions. They are a natural subclass of submodular functions and arise in a number of applications. We give an algorithm that for any γ,δ>0\gamma,\delta>0, given random and uniform examples of an unknown coverage function cc, finds a function hh that approximates cc within factor 1+γ1+\gamma on all but δ\delta-fraction of the points in time poly(n,1/γ,1/δ)poly(n,1/\gamma,1/\delta). This is the first fully-polynomial algorithm for learning an interesting class of functions in the demanding PMAC model of Balcan and Harvey (2011). Our algorithms are based on several new structural properties of coverage functions. Using the results in (Feldman and Kothari, 2014), we also show that coverage functions are learnable agnostically with excess 1\ell_1-error ϵ\epsilon over all product and symmetric distributions in time nlog(1/ϵ)n^{\log(1/\epsilon)}. In contrast, we show that, without assumptions on the distribution, learning coverage functions is at least as hard as learning polynomial-size disjoint DNF formulas, a class of functions for which the best known algorithm runs in time 2O~(n1/3)2^{\tilde{O}(n^{1/3})} (Klivans and Servedio, 2004). As an application of our learning results, we give simple differentially-private algorithms for releasing monotone conjunction counting queries with low average error. In particular, for any knk \leq n, we obtain private release of kk-way marginals with average error αˉ\bar{\alpha} in time nO(log(1/αˉ))n^{O(\log(1/\bar{\alpha}))}

    Distribution-Independent Evolvability of Linear Threshold Functions

    Full text link
    Valiant's (2007) model of evolvability models the evolutionary process of acquiring useful functionality as a restricted form of learning from random examples. Linear threshold functions and their various subclasses, such as conjunctions and decision lists, play a fundamental role in learning theory and hence their evolvability has been the primary focus of research on Valiant's framework (2007). One of the main open problems regarding the model is whether conjunctions are evolvable distribution-independently (Feldman and Valiant, 2008). We show that the answer is negative. Our proof is based on a new combinatorial parameter of a concept class that lower-bounds the complexity of learning from correlations. We contrast the lower bound with a proof that linear threshold functions having a non-negligible margin on the data points are evolvable distribution-independently via a simple mutation algorithm. Our algorithm relies on a non-linear loss function being used to select the hypotheses instead of 0-1 loss in Valiant's (2007) original definition. The proof of evolvability requires that the loss function satisfies several mild conditions that are, for example, satisfied by the quadratic loss function studied in several other works (Michael, 2007; Feldman, 2009; Valiant, 2010). An important property of our evolution algorithm is monotonicity, that is the algorithm guarantees evolvability without any decreases in performance. Previously, monotone evolvability was only shown for conjunctions with quadratic loss (Feldman, 2009) or when the distribution on the domain is severely restricted (Michael, 2007; Feldman, 2009; Kanade et al., 2010

    Efficiently Learning Monotone Decision Trees with ID3

    Get PDF
    Since the Probably Approximately Correct learning model was introduced in 1984, there has been much effort in designing computationally efficient algorithms for learning Boolean functions from random examples drawn from a uniform distribution. In this paper, I take the ID3 information-gain-first classification algorithm and apply it to the task of learning monotone Boolean functions from examples that are uniformly distributed over {0,1}^n. I limited my scope to the class of monotone Boolean functions that can be represented as read-2 width-2 disjunctive normal form expressions. I modeled these functions as graphs and examined each type of connected component contained in these models, i.e. path graphs and cycle graphs. I determined the influence of the variables in the pieces of these graph models in order to understand how ID3 behaves when learning these functions. My findings show that ID3 will produce an optimal decision tree for this class of Boolean functions

    Almost Optimal Testers for Concise Representations

    Get PDF
    We give improved and almost optimal testers for several classes of Boolean functions on n variables that have concise representation in the uniform and distribution-free model. Classes, such as k-Junta, k-Linear, s-Term DNF, s-Term Monotone DNF, r-DNF, Decision List, r-Decision List, size-s Decision Tree, size-s Boolean Formula, size-s Branching Program, s-Sparse Polynomial over the binary field and functions with Fourier Degree at most d. The approach is new and combines ideas from Diakonikolas et al. [Ilias Diakonikolas et al., 2007], Bshouty [Nader H. Bshouty, 2018], Goldreich et al. [Oded Goldreich et al., 1998], and learning theory. The method can be extended to several other classes of functions over any domain that can be approximated by functions with a small number of relevant variables
    corecore