642 research outputs found
Learning DNF Expressions from Fourier Spectrum
Since its introduction by Valiant in 1984, PAC learning of DNF expressions
remains one of the central problems in learning theory. We consider this
problem in the setting where the underlying distribution is uniform, or more
generally, a product distribution. Kalai, Samorodnitsky and Teng (2009) showed
that in this setting a DNF expression can be efficiently approximated from its
"heavy" low-degree Fourier coefficients alone. This is in contrast to previous
approaches where boosting was used and thus Fourier coefficients of the target
function modified by various distributions were needed. This property is
crucial for learning of DNF expressions over smoothed product distributions, a
learning model introduced by Kalai et al. (2009) and inspired by the seminal
smoothed analysis model of Spielman and Teng (2001).
We introduce a new approach to learning (or approximating) a polynomial
threshold functions which is based on creating a function with range [-1,1]
that approximately agrees with the unknown function on low-degree Fourier
coefficients. We then describe conditions under which this is sufficient for
learning polynomial threshold functions. Our approach yields a new, simple
algorithm for approximating any polynomial-size DNF expression from its "heavy"
low-degree Fourier coefficients alone. Our algorithm greatly simplifies the
proof of learnability of DNF expressions over smoothed product distributions.
We also describe an application of our algorithm to learning monotone DNF
expressions over product distributions. Building on the work of Servedio
(2001), we give an algorithm that runs in time \poly((s \cdot
\log{(s/\eps)})^{\log{(s/\eps)}}, n), where is the size of the target DNF
expression and \eps is the accuracy. This improves on \poly((s \cdot
\log{(ns/\eps)})^{\log{(s/\eps)} \cdot \log{(1/\eps)}}, n) bound of Servedio
(2001).Comment: Appears in Conference on Learning Theory (COLT) 201
Learning Geometric Concepts with Nasty Noise
We study the efficient learnability of geometric concept classes -
specifically, low-degree polynomial threshold functions (PTFs) and
intersections of halfspaces - when a fraction of the data is adversarially
corrupted. We give the first polynomial-time PAC learning algorithms for these
concept classes with dimension-independent error guarantees in the presence of
nasty noise under the Gaussian distribution. In the nasty noise model, an
omniscient adversary can arbitrarily corrupt a small fraction of both the
unlabeled data points and their labels. This model generalizes well-studied
noise models, including the malicious noise model and the agnostic (adversarial
label noise) model. Prior to our work, the only concept class for which
efficient malicious learning algorithms were known was the class of
origin-centered halfspaces.
Specifically, our robust learning algorithm for low-degree PTFs succeeds
under a number of tame distributions -- including the Gaussian distribution
and, more generally, any log-concave distribution with (approximately) known
low-degree moments. For LTFs under the Gaussian distribution, we give a
polynomial-time algorithm that achieves error , where
is the noise rate. At the core of our PAC learning results is an efficient
algorithm to approximate the low-degree Chow-parameters of any bounded function
in the presence of nasty noise. To achieve this, we employ an iterative
spectral method for outlier detection and removal, inspired by recent work in
robust unsupervised learning. Our aforementioned algorithm succeeds for a range
of distributions satisfying mild concentration bounds and moment assumptions.
The correctness of our robust learning algorithm for intersections of
halfspaces makes essential use of a novel robust inverse independence lemma
that may be of broader interest
The Limitations of Optimization from Samples
In this paper we consider the following question: can we optimize objective
functions from the training data we use to learn them? We formalize this
question through a novel framework we call optimization from samples (OPS). In
OPS, we are given sampled values of a function drawn from some distribution and
the objective is to optimize the function under some constraint.
While there are interesting classes of functions that can be optimized from
samples, our main result is an impossibility. We show that there are classes of
functions which are statistically learnable and optimizable, but for which no
reasonable approximation for optimization from samples is achievable. In
particular, our main result shows that there is no constant factor
approximation for maximizing coverage functions under a cardinality constraint
using polynomially-many samples drawn from any distribution.
We also show tight approximation guarantees for maximization under a
cardinality constraint of several interesting classes of functions including
unit-demand, additive, and general monotone submodular functions, as well as a
constant factor approximation for monotone submodular functions with bounded
curvature
Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning
Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotate
- …