17 research outputs found
Distribution-Independent Evolvability of Linear Threshold Functions
Valiant's (2007) model of evolvability models the evolutionary process of
acquiring useful functionality as a restricted form of learning from random
examples. Linear threshold functions and their various subclasses, such as
conjunctions and decision lists, play a fundamental role in learning theory and
hence their evolvability has been the primary focus of research on Valiant's
framework (2007). One of the main open problems regarding the model is whether
conjunctions are evolvable distribution-independently (Feldman and Valiant,
2008). We show that the answer is negative. Our proof is based on a new
combinatorial parameter of a concept class that lower-bounds the complexity of
learning from correlations.
We contrast the lower bound with a proof that linear threshold functions
having a non-negligible margin on the data points are evolvable
distribution-independently via a simple mutation algorithm. Our algorithm
relies on a non-linear loss function being used to select the hypotheses
instead of 0-1 loss in Valiant's (2007) original definition. The proof of
evolvability requires that the loss function satisfies several mild conditions
that are, for example, satisfied by the quadratic loss function studied in
several other works (Michael, 2007; Feldman, 2009; Valiant, 2010). An important
property of our evolution algorithm is monotonicity, that is the algorithm
guarantees evolvability without any decreases in performance. Previously,
monotone evolvability was only shown for conjunctions with quadratic loss
(Feldman, 2009) or when the distribution on the domain is severely restricted
(Michael, 2007; Feldman, 2009; Kanade et al., 2010
Learning DNF Expressions from Fourier Spectrum
Since its introduction by Valiant in 1984, PAC learning of DNF expressions
remains one of the central problems in learning theory. We consider this
problem in the setting where the underlying distribution is uniform, or more
generally, a product distribution. Kalai, Samorodnitsky and Teng (2009) showed
that in this setting a DNF expression can be efficiently approximated from its
"heavy" low-degree Fourier coefficients alone. This is in contrast to previous
approaches where boosting was used and thus Fourier coefficients of the target
function modified by various distributions were needed. This property is
crucial for learning of DNF expressions over smoothed product distributions, a
learning model introduced by Kalai et al. (2009) and inspired by the seminal
smoothed analysis model of Spielman and Teng (2001).
We introduce a new approach to learning (or approximating) a polynomial
threshold functions which is based on creating a function with range [-1,1]
that approximately agrees with the unknown function on low-degree Fourier
coefficients. We then describe conditions under which this is sufficient for
learning polynomial threshold functions. Our approach yields a new, simple
algorithm for approximating any polynomial-size DNF expression from its "heavy"
low-degree Fourier coefficients alone. Our algorithm greatly simplifies the
proof of learnability of DNF expressions over smoothed product distributions.
We also describe an application of our algorithm to learning monotone DNF
expressions over product distributions. Building on the work of Servedio
(2001), we give an algorithm that runs in time \poly((s \cdot
\log{(s/\eps)})^{\log{(s/\eps)}}, n), where is the size of the target DNF
expression and \eps is the accuracy. This improves on \poly((s \cdot
\log{(ns/\eps)})^{\log{(s/\eps)} \cdot \log{(1/\eps)}}, n) bound of Servedio
(2001).Comment: Appears in Conference on Learning Theory (COLT) 201
A Complete Characterization of Statistical Query Learning with Applications to Evolvability
Statistical query (SQ) learning model of Kearns (1993) is a natural
restriction of the PAC learning model in which a learning algorithm is allowed
to obtain estimates of statistical properties of the examples but cannot see
the examples themselves. We describe a new and simple characterization of the
query complexity of learning in the SQ learning model. Unlike the previously
known bounds on SQ learning our characterization preserves the accuracy and the
efficiency of learning. The preservation of accuracy implies that that our
characterization gives the first characterization of SQ learning in the
agnostic learning framework. The preservation of efficiency is achieved using a
new boosting technique and allows us to derive a new approach to the design of
evolutionary algorithms in Valiant's (2006) model of evolvability. We use this
approach to demonstrate the existence of a large class of monotone evolutionary
learning algorithms based on square loss performance estimation. These results
differ significantly from the few known evolutionary algorithms and give
evidence that evolvability in Valiant's model is a more versatile phenomenon
than there had been previous reason to suspect.Comment: Simplified Lemma 3.8 and it's application
Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy
We describe a framework for designing efficient active learning algorithms
that are tolerant to random classification noise and are
differentially-private. The framework is based on active learning algorithms
that are statistical in the sense that they rely on estimates of expectations
of functions of filtered random examples. It builds on the powerful statistical
query framework of Kearns (1993).
We show that any efficient active statistical learning algorithm can be
automatically converted to an efficient active learning algorithm which is
tolerant to random classification noise as well as other forms of
"uncorrelated" noise. The complexity of the resulting algorithms has
information-theoretically optimal quadratic dependence on , where
is the noise rate.
We show that commonly studied concept classes including thresholds,
rectangles, and linear separators can be efficiently actively learned in our
framework. These results combined with our generic conversion lead to the first
computationally-efficient algorithms for actively learning some of these
concept classes in the presence of random classification noise that provide
exponential improvement in the dependence on the error over their
passive counterparts. In addition, we show that our algorithms can be
automatically converted to efficient active differentially-private algorithms.
This leads to the first differentially-private active learning algorithms with
exponential label savings over the passive case.Comment: Extended abstract appears in NIPS 201
On Boosting Sparse Parities
Abstract While boosting has been extensively studied, considerably less attention has been devoted to the task of designing good weak learning algorithms. In this paper we consider the problem of designing weak learners that are especially adept to the boosting procedure and specifically the AdaBoost algorithm. First we describe conditions desirable for a weak learning algorithm. We then propose using sparse parity functions as weak learners, which have many of our desired properties, as weak learners in boosting. Our experimental tests show the proposed weak learners to be competitive with the most widely used ones: decision stumps and pruned decision trees