2,578 research outputs found
Learning circuits with few negations
Monotone Boolean functions, and the monotone Boolean circuits that compute
them, have been intensively studied in complexity theory. In this paper we
study the structure of Boolean functions in terms of the minimum number of
negations in any circuit computing them, a complexity measure that interpolates
between monotone functions and the class of all functions. We study this
generalization of monotonicity from the vantage point of learning theory,
giving near-matching upper and lower bounds on the uniform-distribution
learnability of circuits in terms of the number of negations they contain. Our
upper bounds are based on a new structural characterization of negation-limited
circuits that extends a classical result of A. A. Markov. Our lower bounds,
which employ Fourier-analytic tools from hardness amplification, give new
results even for circuits with no negations (i.e. monotone functions)
Approximate resilience, monotonicity, and the complexity of agnostic learning
A function is -resilient if all its Fourier coefficients of degree at
most are zero, i.e., is uncorrelated with all low-degree parities. We
study the notion of of Boolean
functions, where we say that is -approximately -resilient if
is -close to a -valued -resilient function in
distance. We show that approximate resilience essentially characterizes the
complexity of agnostic learning of a concept class over the uniform
distribution. Roughly speaking, if all functions in a class are far from
being -resilient then can be learned agnostically in time and
conversely, if contains a function close to being -resilient then
agnostic learning of in the statistical query (SQ) framework of Kearns has
complexity of at least . This characterization is based on the
duality between approximation by degree- polynomials and
approximate -resilience that we establish. In particular, it implies that
approximation by low-degree polynomials, known to be sufficient for
agnostic learning over product distributions, is in fact necessary.
Focusing on monotone Boolean functions, we exhibit the existence of
near-optimal -approximately
-resilient monotone functions for all
. Prior to our work, it was conceivable even that every monotone
function is -far from any -resilient function. Furthermore, we
construct simple, explicit monotone functions based on and that are close to highly resilient functions. Our constructions are
based on a fairly general resilience analysis and amplification. These
structural results, together with the characterization, imply nearly optimal
lower bounds for agnostic learning of monotone juntas
Learning pseudo-Boolean k-DNF and Submodular Functions
We prove that any submodular function f: {0,1}^n -> {0,1,...,k} can be
represented as a pseudo-Boolean 2k-DNF formula. Pseudo-Boolean DNFs are a
natural generalization of DNF representation for functions with integer range.
Each term in such a formula has an associated integral constant. We show that
an analog of Hastad's switching lemma holds for pseudo-Boolean k-DNFs if all
constants associated with the terms of the formula are bounded.
This allows us to generalize Mansour's PAC-learning algorithm for k-DNFs to
pseudo-Boolean k-DNFs, and hence gives a PAC-learning algorithm with membership
queries under the uniform distribution for submodular functions of the form
f:{0,1}^n -> {0,1,...,k}. Our algorithm runs in time polynomial in n, k^{O(k
\log k / \epsilon)}, 1/\epsilon and log(1/\delta) and works even in the
agnostic setting. The line of previous work on learning submodular functions
[Balcan, Harvey (STOC '11), Gupta, Hardt, Roth, Ullman (STOC '11), Cheraghchi,
Klivans, Kothari, Lee (SODA '12)] implies only n^{O(k)} query complexity for
learning submodular functions in this setting, for fixed epsilon and delta.
Our learning algorithm implies a property tester for submodularity of
functions f:{0,1}^n -> {0, ..., k} with query complexity polynomial in n for
k=O((\log n/ \loglog n)^{1/2}) and constant proximity parameter \epsilon
Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas
We investigate the approximability of several classes of real-valued
functions by functions of a small number of variables ({\em juntas}). Our main
results are tight bounds on the number of variables required to approximate a
function within -error over
the uniform distribution: 1. If is submodular, then it is -close
to a function of variables.
This is an exponential improvement over previously known results. We note that
variables are necessary even for linear
functions. 2. If is fractionally subadditive (XOS) it is -close
to a function of variables. This result holds for all
functions with low total -influence and is a real-valued analogue of
Friedgut's theorem for boolean functions. We show that
variables are necessary even for XOS functions.
As applications of these results, we provide learning algorithms over the
uniform distribution. For XOS functions, we give a PAC learning algorithm that
runs in time . For submodular functions we give
an algorithm in the more demanding PMAC learning model (Balcan and Harvey,
2011) which requires a multiplicative factor approximation with
probability at least over the target distribution. Our uniform
distribution algorithm runs in time .
This is the first algorithm in the PMAC model that over the uniform
distribution can achieve a constant approximation factor arbitrarily close to 1
for all submodular functions. As follows from the lower bounds in (Feldman et
al., 2013) both of these algorithms are close to optimal. We also give
applications for proper learning, testing and agnostic learning with value
queries of these classes.Comment: Extended abstract appears in proceedings of FOCS 201
Learning Coverage Functions and Private Release of Marginals
We study the problem of approximating and learning coverage functions. A
function is a coverage function, if
there exists a universe with non-negative weights for each
and subsets of such that . Alternatively, coverage functions can be described
as non-negative linear combinations of monotone disjunctions. They are a
natural subclass of submodular functions and arise in a number of applications.
We give an algorithm that for any , given random and uniform
examples of an unknown coverage function , finds a function that
approximates within factor on all but -fraction of the
points in time . This is the first fully-polynomial
algorithm for learning an interesting class of functions in the demanding PMAC
model of Balcan and Harvey (2011). Our algorithms are based on several new
structural properties of coverage functions. Using the results in (Feldman and
Kothari, 2014), we also show that coverage functions are learnable agnostically
with excess -error over all product and symmetric
distributions in time . In contrast, we show that,
without assumptions on the distribution, learning coverage functions is at
least as hard as learning polynomial-size disjoint DNF formulas, a class of
functions for which the best known algorithm runs in time
(Klivans and Servedio, 2004).
As an application of our learning results, we give simple
differentially-private algorithms for releasing monotone conjunction counting
queries with low average error. In particular, for any , we obtain
private release of -way marginals with average error in time
Distribution-Independent Evolvability of Linear Threshold Functions
Valiant's (2007) model of evolvability models the evolutionary process of
acquiring useful functionality as a restricted form of learning from random
examples. Linear threshold functions and their various subclasses, such as
conjunctions and decision lists, play a fundamental role in learning theory and
hence their evolvability has been the primary focus of research on Valiant's
framework (2007). One of the main open problems regarding the model is whether
conjunctions are evolvable distribution-independently (Feldman and Valiant,
2008). We show that the answer is negative. Our proof is based on a new
combinatorial parameter of a concept class that lower-bounds the complexity of
learning from correlations.
We contrast the lower bound with a proof that linear threshold functions
having a non-negligible margin on the data points are evolvable
distribution-independently via a simple mutation algorithm. Our algorithm
relies on a non-linear loss function being used to select the hypotheses
instead of 0-1 loss in Valiant's (2007) original definition. The proof of
evolvability requires that the loss function satisfies several mild conditions
that are, for example, satisfied by the quadratic loss function studied in
several other works (Michael, 2007; Feldman, 2009; Valiant, 2010). An important
property of our evolution algorithm is monotonicity, that is the algorithm
guarantees evolvability without any decreases in performance. Previously,
monotone evolvability was only shown for conjunctions with quadratic loss
(Feldman, 2009) or when the distribution on the domain is severely restricted
(Michael, 2007; Feldman, 2009; Kanade et al., 2010
Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations
Consider the following heuristic for building a decision tree for a function
. Place the most influential variable of
at the root, and recurse on the subfunctions and on the
left and right subtrees respectively; terminate once the tree is an
-approximation of . We analyze the quality of this heuristic,
obtaining near-matching upper and lower bounds:
Upper bound: For every with decision tree size and every
, this heuristic builds a decision tree of size
at most .
Lower bound: For every and , there is an with decision tree size such that
this heuristic builds a decision tree of size .
We also obtain upper and lower bounds for monotone functions:
and
respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004)
and Lee (2009).
Our upper bounds yield new algorithms for properly learning decision trees
under the uniform distribution. We show that these algorithms---which are
motivated by widely employed and empirically successful top-down decision tree
learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees
that compare favorably with those of the current fastest algorithm (Ehrenfeucht
and Haussler, 1989). Our lower bounds shed new light on the limitations of
these heuristics.
Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend
it to give the first uniform-distribution proper learning algorithm that
achieves polynomial sample and memory complexity, while matching its
state-of-the-art quasipolynomial runtime
Learning DNF Expressions from Fourier Spectrum
Since its introduction by Valiant in 1984, PAC learning of DNF expressions
remains one of the central problems in learning theory. We consider this
problem in the setting where the underlying distribution is uniform, or more
generally, a product distribution. Kalai, Samorodnitsky and Teng (2009) showed
that in this setting a DNF expression can be efficiently approximated from its
"heavy" low-degree Fourier coefficients alone. This is in contrast to previous
approaches where boosting was used and thus Fourier coefficients of the target
function modified by various distributions were needed. This property is
crucial for learning of DNF expressions over smoothed product distributions, a
learning model introduced by Kalai et al. (2009) and inspired by the seminal
smoothed analysis model of Spielman and Teng (2001).
We introduce a new approach to learning (or approximating) a polynomial
threshold functions which is based on creating a function with range [-1,1]
that approximately agrees with the unknown function on low-degree Fourier
coefficients. We then describe conditions under which this is sufficient for
learning polynomial threshold functions. Our approach yields a new, simple
algorithm for approximating any polynomial-size DNF expression from its "heavy"
low-degree Fourier coefficients alone. Our algorithm greatly simplifies the
proof of learnability of DNF expressions over smoothed product distributions.
We also describe an application of our algorithm to learning monotone DNF
expressions over product distributions. Building on the work of Servedio
(2001), we give an algorithm that runs in time \poly((s \cdot
\log{(s/\eps)})^{\log{(s/\eps)}}, n), where is the size of the target DNF
expression and \eps is the accuracy. This improves on \poly((s \cdot
\log{(ns/\eps)})^{\log{(s/\eps)} \cdot \log{(1/\eps)}}, n) bound of Servedio
(2001).Comment: Appears in Conference on Learning Theory (COLT) 201
- …