530 research outputs found
Noise-Tolerant Learning, the Parity Problem, and the Statistical Query Model
We describe a slightly sub-exponential time algorithm for learning parity
functions in the presence of random classification noise. This results in a
polynomial-time algorithm for the case of parity functions that depend on only
the first O(log n log log n) bits of input. This is the first known instance of
an efficient noise-tolerant algorithm for a concept class that is provably not
learnable in the Statistical Query model of Kearns. Thus, we demonstrate that
the set of problems learnable in the statistical query model is a strict subset
of those problems learnable in the presence of noise in the PAC model.
In coding-theory terms, what we give is a poly(n)-time algorithm for decoding
linear k by n codes in the presence of random noise for the case of k = c log n
loglog n for some c > 0. (The case of k = O(log n) is trivial since one can
just individually check each of the 2^k possible messages and choose the one
that yields the closest codeword.)
A natural extension of the statistical query model is to allow queries about
statistical properties that involve t-tuples of examples (as opposed to single
examples). The second result of this paper is to show that any class of
functions learnable (strongly or weakly) with t-wise queries for t = O(log n)
is also weakly learnable with standard unary queries. Hence this natural
extension to the statistical query model does not increase the set of weakly
learnable functions
A Complete Characterization of Statistical Query Learning with Applications to Evolvability
Statistical query (SQ) learning model of Kearns (1993) is a natural
restriction of the PAC learning model in which a learning algorithm is allowed
to obtain estimates of statistical properties of the examples but cannot see
the examples themselves. We describe a new and simple characterization of the
query complexity of learning in the SQ learning model. Unlike the previously
known bounds on SQ learning our characterization preserves the accuracy and the
efficiency of learning. The preservation of accuracy implies that that our
characterization gives the first characterization of SQ learning in the
agnostic learning framework. The preservation of efficiency is achieved using a
new boosting technique and allows us to derive a new approach to the design of
evolutionary algorithms in Valiant's (2006) model of evolvability. We use this
approach to demonstrate the existence of a large class of monotone evolutionary
learning algorithms based on square loss performance estimation. These results
differ significantly from the few known evolutionary algorithms and give
evidence that evolvability in Valiant's model is a more versatile phenomenon
than there had been previous reason to suspect.Comment: Simplified Lemma 3.8 and it's application
Learning Coverage Functions and Private Release of Marginals
We study the problem of approximating and learning coverage functions. A
function is a coverage function, if
there exists a universe with non-negative weights for each
and subsets of such that . Alternatively, coverage functions can be described
as non-negative linear combinations of monotone disjunctions. They are a
natural subclass of submodular functions and arise in a number of applications.
We give an algorithm that for any , given random and uniform
examples of an unknown coverage function , finds a function that
approximates within factor on all but -fraction of the
points in time . This is the first fully-polynomial
algorithm for learning an interesting class of functions in the demanding PMAC
model of Balcan and Harvey (2011). Our algorithms are based on several new
structural properties of coverage functions. Using the results in (Feldman and
Kothari, 2014), we also show that coverage functions are learnable agnostically
with excess -error over all product and symmetric
distributions in time . In contrast, we show that,
without assumptions on the distribution, learning coverage functions is at
least as hard as learning polynomial-size disjoint DNF formulas, a class of
functions for which the best known algorithm runs in time
(Klivans and Servedio, 2004).
As an application of our learning results, we give simple
differentially-private algorithms for releasing monotone conjunction counting
queries with low average error. In particular, for any , we obtain
private release of -way marginals with average error in time
On Statistical Query Sampling and NMR Quantum Computing
We introduce a ``Statistical Query Sampling'' model, in which the goal of an
algorithm is to produce an element in a hidden set with
reasonable probability. The algorithm gains information about through
oracle calls (statistical queries), where the algorithm submits a query
function and receives an approximation to . We
show how this model is related to NMR quantum computing, in which only
statistical properties of an ensemble of quantum systems can be measured, and
in particular to the question of whether one can translate standard quantum
algorithms to the NMR setting without putting all of their classical
post-processing into the quantum system. Using Fourier analysis techniques
developed in the related context of {em statistical query learning}, we prove a
number of lower bounds (both information-theoretic and cryptographic) on the
ability of algorithms to produces an , even when the set is fairly
simple. These lower bounds point out a difficulty in efficiently applying NMR
quantum computing to algorithms such as Shor's and Simon's algorithm that
involve significant classical post-processing. We also explicitly relate the
notion of statistical query sampling to that of statistical query learning.
An extended abstract appeared in the 18th Aunnual IEEE Conference of
Computational Complexity (CCC 2003), 2003.
Keywords: statistical query, NMR quantum computing, lower boundComment: 17 pages, no figures. Appeared in 18th Aunnual IEEE Conference of
Computational Complexity (CCC 2003
Quantum machine learning: a classical perspective
Recently, increased computational power and data availability, as well as
algorithmic advances, have led machine learning techniques to impressive
results in regression, classification, data-generation and reinforcement
learning tasks. Despite these successes, the proximity to the physical limits
of chip fabrication alongside the increasing size of datasets are motivating a
growing number of researchers to explore the possibility of harnessing the
power of quantum computation to speed-up classical machine learning algorithms.
Here we review the literature in quantum machine learning and discuss
perspectives for a mixed readership of classical machine learning and quantum
computation experts. Particular emphasis will be placed on clarifying the
limitations of quantum algorithms, how they compare with their best classical
counterparts and why quantum resources are expected to provide advantages for
learning problems. Learning in the presence of noise and certain
computationally hard problems in machine learning are identified as promising
directions for the field. Practical questions, like how to upload classical
data into quantum form, will also be addressed.Comment: v3 33 pages; typos corrected and references adde
Making Risk Minimization Tolerant to Label Noise
In many applications, the training data, from which one needs to learn a
classifier, is corrupted with label noise. Many standard algorithms such as SVM
perform poorly in presence of label noise. In this paper we investigate the
robustness of risk minimization to label noise. We prove a sufficient condition
on a loss function for the risk minimization under that loss to be tolerant to
uniform label noise. We show that the loss, sigmoid loss, ramp loss and
probit loss satisfy this condition though none of the standard convex loss
functions satisfy it. We also prove that, by choosing a sufficiently large
value of a parameter in the loss function, the sigmoid loss, ramp loss and
probit loss can be made tolerant to non-uniform label noise also if we can
assume the classes to be separable under noise-free data distribution. Through
extensive empirical studies, we show that risk minimization under the
loss, the sigmoid loss and the ramp loss has much better robustness to label
noise when compared to the SVM algorithm
General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting
AbstractWe derive general bounds on the complexity of learning in the statistical query (SQ) model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the SQ model. This new model was introduced by Kearns to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Since all SQ algorithms can be simulated in the PAC model with classification noise, we also obtain general upper bounds on learning in the presence of classification noise for classes which can be learned in the SQ model
Learning DNFs under product distributions via {\mu}-biased quantum Fourier sampling
We show that DNF formulae can be quantum PAC-learned in polynomial time under
product distributions using a quantum example oracle. The best classical
algorithm (without access to membership queries) runs in superpolynomial time.
Our result extends the work by Bshouty and Jackson (1998) that proved that DNF
formulae are efficiently learnable under the uniform distribution using a
quantum example oracle. Our proof is based on a new quantum algorithm that
efficiently samples the coefficients of a {\mu}-biased Fourier transform.Comment: 17 pages; v3 based on journal version; minor corrections and
clarification
Distribution-Independent Evolvability of Linear Threshold Functions
Valiant's (2007) model of evolvability models the evolutionary process of
acquiring useful functionality as a restricted form of learning from random
examples. Linear threshold functions and their various subclasses, such as
conjunctions and decision lists, play a fundamental role in learning theory and
hence their evolvability has been the primary focus of research on Valiant's
framework (2007). One of the main open problems regarding the model is whether
conjunctions are evolvable distribution-independently (Feldman and Valiant,
2008). We show that the answer is negative. Our proof is based on a new
combinatorial parameter of a concept class that lower-bounds the complexity of
learning from correlations.
We contrast the lower bound with a proof that linear threshold functions
having a non-negligible margin on the data points are evolvable
distribution-independently via a simple mutation algorithm. Our algorithm
relies on a non-linear loss function being used to select the hypotheses
instead of 0-1 loss in Valiant's (2007) original definition. The proof of
evolvability requires that the loss function satisfies several mild conditions
that are, for example, satisfied by the quadratic loss function studied in
several other works (Michael, 2007; Feldman, 2009; Valiant, 2010). An important
property of our evolution algorithm is monotonicity, that is the algorithm
guarantees evolvability without any decreases in performance. Previously,
monotone evolvability was only shown for conjunctions with quadratic loss
(Feldman, 2009) or when the distribution on the domain is severely restricted
(Michael, 2007; Feldman, 2009; Kanade et al., 2010
- …