1,707 research outputs found
Approximate resilience, monotonicity, and the complexity of agnostic learning
A function is -resilient if all its Fourier coefficients of degree at
most are zero, i.e., is uncorrelated with all low-degree parities. We
study the notion of of Boolean
functions, where we say that is -approximately -resilient if
is -close to a -valued -resilient function in
distance. We show that approximate resilience essentially characterizes the
complexity of agnostic learning of a concept class over the uniform
distribution. Roughly speaking, if all functions in a class are far from
being -resilient then can be learned agnostically in time and
conversely, if contains a function close to being -resilient then
agnostic learning of in the statistical query (SQ) framework of Kearns has
complexity of at least . This characterization is based on the
duality between approximation by degree- polynomials and
approximate -resilience that we establish. In particular, it implies that
approximation by low-degree polynomials, known to be sufficient for
agnostic learning over product distributions, is in fact necessary.
Focusing on monotone Boolean functions, we exhibit the existence of
near-optimal -approximately
-resilient monotone functions for all
. Prior to our work, it was conceivable even that every monotone
function is -far from any -resilient function. Furthermore, we
construct simple, explicit monotone functions based on and that are close to highly resilient functions. Our constructions are
based on a fairly general resilience analysis and amplification. These
structural results, together with the characterization, imply nearly optimal
lower bounds for agnostic learning of monotone juntas
Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations
Consider the following heuristic for building a decision tree for a function
. Place the most influential variable of
at the root, and recurse on the subfunctions and on the
left and right subtrees respectively; terminate once the tree is an
-approximation of . We analyze the quality of this heuristic,
obtaining near-matching upper and lower bounds:
Upper bound: For every with decision tree size and every
, this heuristic builds a decision tree of size
at most .
Lower bound: For every and , there is an with decision tree size such that
this heuristic builds a decision tree of size .
We also obtain upper and lower bounds for monotone functions:
and
respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004)
and Lee (2009).
Our upper bounds yield new algorithms for properly learning decision trees
under the uniform distribution. We show that these algorithms---which are
motivated by widely employed and empirically successful top-down decision tree
learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees
that compare favorably with those of the current fastest algorithm (Ehrenfeucht
and Haussler, 1989). Our lower bounds shed new light on the limitations of
these heuristics.
Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend
it to give the first uniform-distribution proper learning algorithm that
achieves polynomial sample and memory complexity, while matching its
state-of-the-art quasipolynomial runtime
Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas
We investigate the approximability of several classes of real-valued
functions by functions of a small number of variables ({\em juntas}). Our main
results are tight bounds on the number of variables required to approximate a
function within -error over
the uniform distribution: 1. If is submodular, then it is -close
to a function of variables.
This is an exponential improvement over previously known results. We note that
variables are necessary even for linear
functions. 2. If is fractionally subadditive (XOS) it is -close
to a function of variables. This result holds for all
functions with low total -influence and is a real-valued analogue of
Friedgut's theorem for boolean functions. We show that
variables are necessary even for XOS functions.
As applications of these results, we provide learning algorithms over the
uniform distribution. For XOS functions, we give a PAC learning algorithm that
runs in time . For submodular functions we give
an algorithm in the more demanding PMAC learning model (Balcan and Harvey,
2011) which requires a multiplicative factor approximation with
probability at least over the target distribution. Our uniform
distribution algorithm runs in time .
This is the first algorithm in the PMAC model that over the uniform
distribution can achieve a constant approximation factor arbitrarily close to 1
for all submodular functions. As follows from the lower bounds in (Feldman et
al., 2013) both of these algorithms are close to optimal. We also give
applications for proper learning, testing and agnostic learning with value
queries of these classes.Comment: Extended abstract appears in proceedings of FOCS 201
Online Learning of k-CNF Boolean Functions
This paper revisits the problem of learning a k-CNF Boolean function from
examples in the context of online learning under the logarithmic loss. In doing
so, we give a Bayesian interpretation to one of Valiant's celebrated PAC
learning algorithms, which we then build upon to derive two efficient, online,
probabilistic, supervised learning algorithms for predicting the output of an
unknown k-CNF Boolean function. We analyze the loss of our methods, and show
that the cumulative log-loss can be upper bounded, ignoring logarithmic
factors, by a polynomial function of the size of each example.Comment: 20 LaTeX pages. 2 Algorithms. Some Theorem
A Law of Large Numbers for Weighted Majority
Consider an election between two candidates in which the voters' choices are
random and independent and the probability of a voter choosing the first
candidate is . Condorcet's Jury Theorem which he derived from the weak
law of large numbers asserts that if the number of voters tends to infinity
then the probability that the first candidate will be elected tends to one. The
notion of influence of a voter or its voting power is relevant for extensions
of the weak law of large numbers for voting rules which are more general than
simple majority. In this paper we point out two different ways to extend the
classical notions of voting power and influences to arbitrary probability
distributions. The extension relevant to us is the ``effect'' of a voter, which
is a weighted version of the correlation between the voter's vote and the
election's outcomes. We prove an extension of the weak law of large numbers to
weighted majority games when all individual effects are small and show that
this result does not apply to any voting rule which is not based on weighted
majority
- …