344 research outputs found
A Complete Characterization of Statistical Query Learning with Applications to Evolvability
Statistical query (SQ) learning model of Kearns (1993) is a natural
restriction of the PAC learning model in which a learning algorithm is allowed
to obtain estimates of statistical properties of the examples but cannot see
the examples themselves. We describe a new and simple characterization of the
query complexity of learning in the SQ learning model. Unlike the previously
known bounds on SQ learning our characterization preserves the accuracy and the
efficiency of learning. The preservation of accuracy implies that that our
characterization gives the first characterization of SQ learning in the
agnostic learning framework. The preservation of efficiency is achieved using a
new boosting technique and allows us to derive a new approach to the design of
evolutionary algorithms in Valiant's (2006) model of evolvability. We use this
approach to demonstrate the existence of a large class of monotone evolutionary
learning algorithms based on square loss performance estimation. These results
differ significantly from the few known evolutionary algorithms and give
evidence that evolvability in Valiant's model is a more versatile phenomenon
than there had been previous reason to suspect.Comment: Simplified Lemma 3.8 and it's application
Distribution-Independent Evolvability of Linear Threshold Functions
Valiant's (2007) model of evolvability models the evolutionary process of
acquiring useful functionality as a restricted form of learning from random
examples. Linear threshold functions and their various subclasses, such as
conjunctions and decision lists, play a fundamental role in learning theory and
hence their evolvability has been the primary focus of research on Valiant's
framework (2007). One of the main open problems regarding the model is whether
conjunctions are evolvable distribution-independently (Feldman and Valiant,
2008). We show that the answer is negative. Our proof is based on a new
combinatorial parameter of a concept class that lower-bounds the complexity of
learning from correlations.
We contrast the lower bound with a proof that linear threshold functions
having a non-negligible margin on the data points are evolvable
distribution-independently via a simple mutation algorithm. Our algorithm
relies on a non-linear loss function being used to select the hypotheses
instead of 0-1 loss in Valiant's (2007) original definition. The proof of
evolvability requires that the loss function satisfies several mild conditions
that are, for example, satisfied by the quadratic loss function studied in
several other works (Michael, 2007; Feldman, 2009; Valiant, 2010). An important
property of our evolution algorithm is monotonicity, that is the algorithm
guarantees evolvability without any decreases in performance. Previously,
monotone evolvability was only shown for conjunctions with quadratic loss
(Feldman, 2009) or when the distribution on the domain is severely restricted
(Michael, 2007; Feldman, 2009; Kanade et al., 2010
Evolution with Drifting Targets
We consider the question of the stability of evolutionary algorithms to
gradual changes, or drift, in the target concept. We define an algorithm to be
resistant to drift if, for some inverse polynomial drift rate in the target
function, it converges to accuracy 1 -- \epsilon , with polynomial resources,
and then stays within that accuracy indefinitely, except with probability
\epsilon , at any one time. We show that every evolution algorithm, in the
sense of Valiant (2007; 2009), can be converted using the Correlational Query
technique of Feldman (2008), into such a drift resistant algorithm. For certain
evolutionary algorithms, such as for Boolean conjunctions, we give bounds on
the rates of drift that they can resist. We develop some new evolution
algorithms that are resistant to significant drift. In particular, we give an
algorithm for evolving linear separators over the spherically symmetric
distribution that is resistant to a drift rate of O(\epsilon /n), and another
algorithm over the more general product normal distributions that resists a
smaller drift rate.
The above translation result can be also interpreted as one on the robustness
of the notion of evolvability itself under changes of definition. As a second
result in that direction we show that every evolution algorithm can be
converted to a quasi-monotonic one that can evolve from any starting point
without the performance ever dipping significantly below that of the starting
point. This permits the somewhat unnatural feature of arbitrary performance
degradations to be removed from several known robustness translations
Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability
In this paper we revisit some classic problems on classification under
misspecification. In particular, we study the problem of learning halfspaces
under Massart noise with rate . In a recent work, Diakonikolas,
Goulekakis, and Tzamos resolved a long-standing problem by giving the first
efficient algorithm for learning to accuracy for any
. However, their algorithm outputs a complicated hypothesis,
which partitions space into regions. Here we give a
much simpler algorithm and in the process resolve a number of outstanding open
questions:
(1) We give the first proper learner for Massart halfspaces that achieves
. We also give improved bounds on the sample complexity
achievable by polynomial time algorithms.
(2) Based on (1), we develop a blackbox knowledge distillation procedure to
convert an arbitrarily complex classifier to an equally good proper classifier.
(3) By leveraging a simple but overlooked connection to evolvability, we show
any SQ algorithm requires super-polynomially many queries to achieve
.
Moreover we study generalized linear models where for any odd, monotone, and
Lipschitz function . This family includes the previously mentioned
halfspace models as a special case, but is much richer and includes other
fundamental models like logistic regression. We introduce a challenging new
corruption model that generalizes Massart noise, and give a general algorithm
for learning in this setting. Our algorithms are based on a small set of core
recipes for learning to classify in the presence of misspecification.
Finally we study our algorithm for learning halfspaces under Massart noise
empirically and find that it exhibits some appealing fairness properties.Comment: 51 pages, comments welcom
Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization
Stochastic convex optimization, by which the objective is the expectation of a random convex function, is an important and widely used method with numerous applications in machine learning, statistics, operations research, and other areas. We study the complexity of stochastic convex optimization given only statistical query (SQ) access to the objective function. We show that well-known and popular first-order iterative methods can be implemented using only statistical queries. For many cases of interest, we derive nearly matching upper and lower bounds on the estimation (sample) complexity, including linear optimization in the most general setting. We then present several consequences for machine learning, differential privacy, and proving concrete lower bounds on the power of convex optimization–based methods. The key ingredient of our work is SQ algorithms and lower bounds for estimating the mean vector of a distribution over vectors supported on a convex body in Rd. This natural problem has not been previously studied, and we show that our solutions can be used to get substantially improved SQ versions of Perceptron and other online algorithms for learning halfspaces
Approximate resilience, monotonicity, and the complexity of agnostic learning
A function is -resilient if all its Fourier coefficients of degree at
most are zero, i.e., is uncorrelated with all low-degree parities. We
study the notion of of Boolean
functions, where we say that is -approximately -resilient if
is -close to a -valued -resilient function in
distance. We show that approximate resilience essentially characterizes the
complexity of agnostic learning of a concept class over the uniform
distribution. Roughly speaking, if all functions in a class are far from
being -resilient then can be learned agnostically in time and
conversely, if contains a function close to being -resilient then
agnostic learning of in the statistical query (SQ) framework of Kearns has
complexity of at least . This characterization is based on the
duality between approximation by degree- polynomials and
approximate -resilience that we establish. In particular, it implies that
approximation by low-degree polynomials, known to be sufficient for
agnostic learning over product distributions, is in fact necessary.
Focusing on monotone Boolean functions, we exhibit the existence of
near-optimal -approximately
-resilient monotone functions for all
. Prior to our work, it was conceivable even that every monotone
function is -far from any -resilient function. Furthermore, we
construct simple, explicit monotone functions based on and that are close to highly resilient functions. Our constructions are
based on a fairly general resilience analysis and amplification. These
structural results, together with the characterization, imply nearly optimal
lower bounds for agnostic learning of monotone juntas
- …