1,559 research outputs found
Agnostic Learning from Tolerant Natural Proofs
We generalize the "learning algorithms from natural properties" framework of [CIKK16] to get agnostic learning algorithms from natural properties with extra features. We show that if a natural property (in the sense of Razborov and Rudich [RR97]) is useful also against functions that are close to the class of "easy" functions, rather than just against "easy" functions, then it can be used to get an agnostic learning algorithm over the uniform distribution with membership queries.
* For AC0[q], any prime q (constant-depth circuits of polynomial size, with AND, OR, NOT, and MODq gates of unbounded fanin), which happens to have a natural property with the requisite extra feature by [Raz87, Smo87, RR97], we obtain the first agnostic learning algorithm for AC0[q], for every prime q. Our algorithm runs in randomized quasi-polynomial time, uses membership queries, and outputs a circuit for a given Boolean function f that agrees with f on all but at most polylog(n)*opt fraction of inputs, where opt is the relative distance between f and the closest function h in the class AC0[q].
* For the ideal case, a natural proof of strongly exponential correlation circuit lower bounds against a circuit class C containing AC0[2] (i.e., circuits of size exp(Omega(n)) cannot compute some n-variate function even with exp(-Omega(n)) advantage over random guessing) would yield a polynomial-time query agnostic learning algorithm for C with the approximation error O(opt)
A Complete Characterization of Statistical Query Learning with Applications to Evolvability
Statistical query (SQ) learning model of Kearns (1993) is a natural
restriction of the PAC learning model in which a learning algorithm is allowed
to obtain estimates of statistical properties of the examples but cannot see
the examples themselves. We describe a new and simple characterization of the
query complexity of learning in the SQ learning model. Unlike the previously
known bounds on SQ learning our characterization preserves the accuracy and the
efficiency of learning. The preservation of accuracy implies that that our
characterization gives the first characterization of SQ learning in the
agnostic learning framework. The preservation of efficiency is achieved using a
new boosting technique and allows us to derive a new approach to the design of
evolutionary algorithms in Valiant's (2006) model of evolvability. We use this
approach to demonstrate the existence of a large class of monotone evolutionary
learning algorithms based on square loss performance estimation. These results
differ significantly from the few known evolutionary algorithms and give
evidence that evolvability in Valiant's model is a more versatile phenomenon
than there had been previous reason to suspect.Comment: Simplified Lemma 3.8 and it's application
Privately Releasing Conjunctions and the Statistical Query Barrier
Suppose we would like to know all answers to a set of statistical queries C
on a data set up to small error, but we can only access the data itself using
statistical queries. A trivial solution is to exhaustively ask all queries in
C. Can we do any better?
+ We show that the number of statistical queries necessary and sufficient for
this task is---up to polynomial factors---equal to the agnostic learning
complexity of C in Kearns' statistical query (SQ) model. This gives a complete
answer to the question when running time is not a concern.
+ We then show that the problem can be solved efficiently (allowing arbitrary
error on a small fraction of queries) whenever the answers to C can be
described by a submodular function. This includes many natural concept classes,
such as graph cuts and Boolean disjunctions and conjunctions.
While interesting from a learning theoretic point of view, our main
applications are in privacy-preserving data analysis:
Here, our second result leads to the first algorithm that efficiently
releases differentially private answers to of all Boolean conjunctions with 1%
average error. This presents significant progress on a key open problem in
privacy-preserving data analysis.
Our first result on the other hand gives unconditional lower bounds on any
differentially private algorithm that admits a (potentially
non-privacy-preserving) implementation using only statistical queries. Not only
our algorithms, but also most known private algorithms can be implemented using
only statistical queries, and hence are constrained by these lower bounds. Our
result therefore isolates the complexity of agnostic learning in the SQ-model
as a new barrier in the design of differentially private algorithms
Sampling Correctors
In many situations, sample data is obtained from a noisy or imperfect source.
In order to address such corruptions, this paper introduces the concept of a
sampling corrector. Such algorithms use structure that the distribution is
purported to have, in order to allow one to make "on-the-fly" corrections to
samples drawn from probability distributions. These algorithms then act as
filters between the noisy data and the end user.
We show connections between sampling correctors, distribution learning
algorithms, and distribution property testing algorithms. We show that these
connections can be utilized to expand the applicability of known distribution
learning and property testing algorithms as well as to achieve improved
algorithms for those tasks.
As a first step, we show how to design sampling correctors using proper
learning algorithms. We then focus on the question of whether algorithms for
sampling correctors can be more efficient in terms of sample complexity than
learning algorithms for the analogous families of distributions. When
correcting monotonicity, we show that this is indeed the case when also granted
query access to the cumulative distribution function. We also obtain sampling
correctors for monotonicity without this stronger type of access, provided that
the distribution be originally very close to monotone (namely, at a distance
). In addition to that, we consider a restricted error model
that aims at capturing "missing data" corruptions. In this model, we show that
distributions that are close to monotone have sampling correctors that are
significantly more efficient than achievable by the learning approach.
We also consider the question of whether an additional source of independent
random bits is required by sampling correctors to implement the correction
process
Moment-Matching Polynomials
We give a new framework for proving the existence of low-degree, polynomial
approximators for Boolean functions with respect to broad classes of
non-product distributions. Our proofs use techniques related to the classical
moment problem and deviate significantly from known Fourier-based methods,
which require the underlying distribution to have some product structure.
Our main application is the first polynomial-time algorithm for agnostically
learning any function of a constant number of halfspaces with respect to any
log-concave distribution (for any constant accuracy parameter). This result was
not known even for the case of learning the intersection of two halfspaces
without noise. Additionally, we show that in the "smoothed-analysis" setting,
the above results hold with respect to distributions that have sub-exponential
tails, a property satisfied by many natural and well-studied distributions in
machine learning.
Given that our algorithms can be implemented using Support Vector Machines
(SVMs) with a polynomial kernel, these results give a rigorous theoretical
explanation as to why many kernel methods work so well in practice
Agnostic Membership Query Learning with Nontrivial Savings: New Results, Techniques
(Abridged) Designing computationally efficient algorithms in the agnostic
learning model (Haussler, 1992; Kearns et al., 1994) is notoriously difficult.
In this work, we consider agnostic learning with membership queries for
touchstone classes at the frontier of agnostic learning, with a focus on how
much computation can be saved over the trivial runtime of 2^n$. This approach
is inspired by and continues the study of ``learning with nontrivial savings''
(Servedio and Tan, 2017). To this end, we establish multiple agnostic learning
algorithms, highlighted by:
1. An agnostic learning algorithm for circuits consisting of a sublinear
number of gates, which can each be any function computable by a sublogarithmic
degree k polynomial threshold function (the depth of the circuit is bounded
only by size). This algorithm runs in time 2^{n -s(n)} for s(n) \approx
n/(k+1), and learns over the uniform distribution over unlabelled examples on
\{0,1\}^n.
2. An agnostic learning algorithm for circuits consisting of a sublinear
number of gates, where each can be any function computable by a \sym^+ circuit
of subexponential size and sublogarithmic degree k. This algorithm runs in time
2^{n-s(n)} for s(n) \approx n/(k+1), and learns over distributions of
unlabelled examples that are products of k+1 arbitrary and unknown
distributions, each over \{0,1\}^{n/(k+1)} (assume without loss of generality
that k+1 divides n)
Testing k-Monotonicity
A Boolean k-monotone function defined over a finite poset domain D alternates between the values 0 and 1 at most k times on any ascending chain in D. Therefore, k-monotone functions are natural generalizations of the classical monotone functions, which are the 1-monotone functions.
Motivated by the recent interest in k-monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of k-monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are k-monotone (or are close to being k-monotone) from functions that are far from being k-monotone.
Our results include the following:
1. We demonstrate a separation between testing k-monotonicity and testing monotonicity, on the hypercube domain {0,1}^d, for k >= 3;
2. We demonstrate a separation between testing and learning on {0,1}^d, for k=omega(log d): testing k-monotonicity can be performed with 2^{O(sqrt d . log d . log{1/eps})} queries, while learning k-monotone functions requires 2^{Omega(k . sqrt d .{1/eps})} queries (Blais et al. (RANDOM 2015)).
3. We present a tolerant test for functions fcolon[n]^dto {0,1}$with complexity independent of n, which makes progress on a problem left open by Berman et al. (STOC 2014).
Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid [n]^d, and draw connections to distribution testing techniques.
Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid [n]^d, and draw connections to distribution testing techniques
Auditing: Active Learning with Outcome-Dependent Query Costs
We propose a learning setting in which unlabeled data is free, and the cost
of a label depends on its value, which is not known in advance. We study binary
classification in an extreme case, where the algorithm only pays for negative
labels. Our motivation are applications such as fraud detection, in which
investigating an honest transaction should be avoided if possible. We term the
setting auditing, and consider the auditing complexity of an algorithm: the
number of negative labels the algorithm requires in order to learn a hypothesis
with low relative error. We design auditing algorithms for simple hypothesis
classes (thresholds and rectangles), and show that with these algorithms, the
auditing complexity can be significantly lower than the active label
complexity. We also discuss a general competitive approach for auditing and
possible modifications to the framework.Comment: Corrections in section
Learning Coverage Functions and Private Release of Marginals
We study the problem of approximating and learning coverage functions. A
function is a coverage function, if
there exists a universe with non-negative weights for each
and subsets of such that . Alternatively, coverage functions can be described
as non-negative linear combinations of monotone disjunctions. They are a
natural subclass of submodular functions and arise in a number of applications.
We give an algorithm that for any , given random and uniform
examples of an unknown coverage function , finds a function that
approximates within factor on all but -fraction of the
points in time . This is the first fully-polynomial
algorithm for learning an interesting class of functions in the demanding PMAC
model of Balcan and Harvey (2011). Our algorithms are based on several new
structural properties of coverage functions. Using the results in (Feldman and
Kothari, 2014), we also show that coverage functions are learnable agnostically
with excess -error over all product and symmetric
distributions in time . In contrast, we show that,
without assumptions on the distribution, learning coverage functions is at
least as hard as learning polynomial-size disjoint DNF formulas, a class of
functions for which the best known algorithm runs in time
(Klivans and Servedio, 2004).
As an application of our learning results, we give simple
differentially-private algorithms for releasing monotone conjunction counting
queries with low average error. In particular, for any , we obtain
private release of -way marginals with average error in time
- …