1,559 research outputs found

    Agnostic Learning from Tolerant Natural Proofs

    Get PDF
    We generalize the "learning algorithms from natural properties" framework of [CIKK16] to get agnostic learning algorithms from natural properties with extra features. We show that if a natural property (in the sense of Razborov and Rudich [RR97]) is useful also against functions that are close to the class of "easy" functions, rather than just against "easy" functions, then it can be used to get an agnostic learning algorithm over the uniform distribution with membership queries. * For AC0[q], any prime q (constant-depth circuits of polynomial size, with AND, OR, NOT, and MODq gates of unbounded fanin), which happens to have a natural property with the requisite extra feature by [Raz87, Smo87, RR97], we obtain the first agnostic learning algorithm for AC0[q], for every prime q. Our algorithm runs in randomized quasi-polynomial time, uses membership queries, and outputs a circuit for a given Boolean function f that agrees with f on all but at most polylog(n)*opt fraction of inputs, where opt is the relative distance between f and the closest function h in the class AC0[q]. * For the ideal case, a natural proof of strongly exponential correlation circuit lower bounds against a circuit class C containing AC0[2] (i.e., circuits of size exp(Omega(n)) cannot compute some n-variate function even with exp(-Omega(n)) advantage over random guessing) would yield a polynomial-time query agnostic learning algorithm for C with the approximation error O(opt)

    A Complete Characterization of Statistical Query Learning with Applications to Evolvability

    Get PDF
    Statistical query (SQ) learning model of Kearns (1993) is a natural restriction of the PAC learning model in which a learning algorithm is allowed to obtain estimates of statistical properties of the examples but cannot see the examples themselves. We describe a new and simple characterization of the query complexity of learning in the SQ learning model. Unlike the previously known bounds on SQ learning our characterization preserves the accuracy and the efficiency of learning. The preservation of accuracy implies that that our characterization gives the first characterization of SQ learning in the agnostic learning framework. The preservation of efficiency is achieved using a new boosting technique and allows us to derive a new approach to the design of evolutionary algorithms in Valiant's (2006) model of evolvability. We use this approach to demonstrate the existence of a large class of monotone evolutionary learning algorithms based on square loss performance estimation. These results differ significantly from the few known evolutionary algorithms and give evidence that evolvability in Valiant's model is a more versatile phenomenon than there had been previous reason to suspect.Comment: Simplified Lemma 3.8 and it's application

    Privately Releasing Conjunctions and the Statistical Query Barrier

    Full text link
    Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? + We show that the number of statistical queries necessary and sufficient for this task is---up to polynomial factors---equal to the agnostic learning complexity of C in Kearns' statistical query (SQ) model. This gives a complete answer to the question when running time is not a concern. + We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. While interesting from a learning theoretic point of view, our main applications are in privacy-preserving data analysis: Here, our second result leads to the first algorithm that efficiently releases differentially private answers to of all Boolean conjunctions with 1% average error. This presents significant progress on a key open problem in privacy-preserving data analysis. Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially non-privacy-preserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ-model as a new barrier in the design of differentially private algorithms

    Sampling Correctors

    Full text link
    In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, this paper introduces the concept of a sampling corrector. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms then act as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks. As a first step, we show how to design sampling correctors using proper learning algorithms. We then focus on the question of whether algorithms for sampling correctors can be more efficient in terms of sample complexity than learning algorithms for the analogous families of distributions. When correcting monotonicity, we show that this is indeed the case when also granted query access to the cumulative distribution function. We also obtain sampling correctors for monotonicity without this stronger type of access, provided that the distribution be originally very close to monotone (namely, at a distance O(1/log2n)O(1/\log^2 n)). In addition to that, we consider a restricted error model that aims at capturing "missing data" corruptions. In this model, we show that distributions that are close to monotone have sampling correctors that are significantly more efficient than achievable by the learning approach. We also consider the question of whether an additional source of independent random bits is required by sampling correctors to implement the correction process

    Moment-Matching Polynomials

    Full text link
    We give a new framework for proving the existence of low-degree, polynomial approximators for Boolean functions with respect to broad classes of non-product distributions. Our proofs use techniques related to the classical moment problem and deviate significantly from known Fourier-based methods, which require the underlying distribution to have some product structure. Our main application is the first polynomial-time algorithm for agnostically learning any function of a constant number of halfspaces with respect to any log-concave distribution (for any constant accuracy parameter). This result was not known even for the case of learning the intersection of two halfspaces without noise. Additionally, we show that in the "smoothed-analysis" setting, the above results hold with respect to distributions that have sub-exponential tails, a property satisfied by many natural and well-studied distributions in machine learning. Given that our algorithms can be implemented using Support Vector Machines (SVMs) with a polynomial kernel, these results give a rigorous theoretical explanation as to why many kernel methods work so well in practice

    Agnostic Membership Query Learning with Nontrivial Savings: New Results, Techniques

    Full text link
    (Abridged) Designing computationally efficient algorithms in the agnostic learning model (Haussler, 1992; Kearns et al., 1994) is notoriously difficult. In this work, we consider agnostic learning with membership queries for touchstone classes at the frontier of agnostic learning, with a focus on how much computation can be saved over the trivial runtime of 2^n$. This approach is inspired by and continues the study of ``learning with nontrivial savings'' (Servedio and Tan, 2017). To this end, we establish multiple agnostic learning algorithms, highlighted by: 1. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, which can each be any function computable by a sublogarithmic degree k polynomial threshold function (the depth of the circuit is bounded only by size). This algorithm runs in time 2^{n -s(n)} for s(n) \approx n/(k+1), and learns over the uniform distribution over unlabelled examples on \{0,1\}^n. 2. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, where each can be any function computable by a \sym^+ circuit of subexponential size and sublogarithmic degree k. This algorithm runs in time 2^{n-s(n)} for s(n) \approx n/(k+1), and learns over distributions of unlabelled examples that are products of k+1 arbitrary and unknown distributions, each over \{0,1\}^{n/(k+1)} (assume without loss of generality that k+1 divides n)

    Testing k-Monotonicity

    Get PDF
    A Boolean k-monotone function defined over a finite poset domain D alternates between the values 0 and 1 at most k times on any ascending chain in D. Therefore, k-monotone functions are natural generalizations of the classical monotone functions, which are the 1-monotone functions. Motivated by the recent interest in k-monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of k-monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are k-monotone (or are close to being k-monotone) from functions that are far from being k-monotone. Our results include the following: 1. We demonstrate a separation between testing k-monotonicity and testing monotonicity, on the hypercube domain {0,1}^d, for k >= 3; 2. We demonstrate a separation between testing and learning on {0,1}^d, for k=omega(log d): testing k-monotonicity can be performed with 2^{O(sqrt d . log d . log{1/eps})} queries, while learning k-monotone functions requires 2^{Omega(k . sqrt d .{1/eps})} queries (Blais et al. (RANDOM 2015)). 3. We present a tolerant test for functions fcolon[n]^dto {0,1}$with complexity independent of n, which makes progress on a problem left open by Berman et al. (STOC 2014). Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid [n]^d, and draw connections to distribution testing techniques. Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid [n]^d, and draw connections to distribution testing techniques

    Auditing: Active Learning with Outcome-Dependent Query Costs

    Full text link
    We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also discuss a general competitive approach for auditing and possible modifications to the framework.Comment: Corrections in section

    Learning Coverage Functions and Private Release of Marginals

    Full text link
    We study the problem of approximating and learning coverage functions. A function c:2[n]R+c: 2^{[n]} \rightarrow \mathbf{R}^{+} is a coverage function, if there exists a universe UU with non-negative weights w(u)w(u) for each uUu \in U and subsets A1,A2,,AnA_1, A_2, \ldots, A_n of UU such that c(S)=uiSAiw(u)c(S) = \sum_{u \in \cup_{i \in S} A_i} w(u). Alternatively, coverage functions can be described as non-negative linear combinations of monotone disjunctions. They are a natural subclass of submodular functions and arise in a number of applications. We give an algorithm that for any γ,δ>0\gamma,\delta>0, given random and uniform examples of an unknown coverage function cc, finds a function hh that approximates cc within factor 1+γ1+\gamma on all but δ\delta-fraction of the points in time poly(n,1/γ,1/δ)poly(n,1/\gamma,1/\delta). This is the first fully-polynomial algorithm for learning an interesting class of functions in the demanding PMAC model of Balcan and Harvey (2011). Our algorithms are based on several new structural properties of coverage functions. Using the results in (Feldman and Kothari, 2014), we also show that coverage functions are learnable agnostically with excess 1\ell_1-error ϵ\epsilon over all product and symmetric distributions in time nlog(1/ϵ)n^{\log(1/\epsilon)}. In contrast, we show that, without assumptions on the distribution, learning coverage functions is at least as hard as learning polynomial-size disjoint DNF formulas, a class of functions for which the best known algorithm runs in time 2O~(n1/3)2^{\tilde{O}(n^{1/3})} (Klivans and Servedio, 2004). As an application of our learning results, we give simple differentially-private algorithms for releasing monotone conjunction counting queries with low average error. In particular, for any knk \leq n, we obtain private release of kk-way marginals with average error αˉ\bar{\alpha} in time nO(log(1/αˉ))n^{O(\log(1/\bar{\alpha}))}
    corecore