135,744 research outputs found
A Uniform Lower Error Bound for Half-Space Learning
We give a lower bound for the error of any unitarily invariant algorithm learning half-spaces against the uniform or related distributions on the unit sphere. The bound is uniform in the choice of the target half-space and has an exponentially decaying deviation probability in the sample. The technique of proof is related to a proof of the Johnson Lindenstrauss Lemma. We argue that, unlike previous lower bounds, our result is well suited to evaluate the benefits of multi-task or transfer learning, or other cases where an expense in the acquisition of domain knowledge has to be justified
The Benefit of Multitask Representation Learning
We discuss a general method to learn data representations from multiple
tasks. We provide a justification for this method in both settings of multitask
learning and learning-to-learn. The method is illustrated in detail in the
special case of linear feature learning. Conditions on the theoretical
advantage offered by multitask representation learning over independent task
learning are established. In particular, focusing on the important example of
half-space learning, we derive the regime in which multitask representation
learning is beneficial over independent task learning, as a function of the
sample size, the number of tasks and the intrinsic data dimensionality. Other
potential applications of our results include multitask feature learning in
reproducing kernel Hilbert spaces and multilayer, deep networks.Comment: To appear in Journal of Machine Learning Research (JMLR). 31 page
Efficient Learning of Linear Separators under Bounded Noise
We study the learnability of linear separators in in the presence of
bounded (a.k.a Massart) noise. This is a realistic generalization of the random
classification noise model, where the adversary can flip each example with
probability . We provide the first polynomial time algorithm
that can learn linear separators to arbitrarily small excess error in this
noise model under the uniform distribution over the unit ball in , for
some constant value of . While widely studied in the statistical learning
theory community in the context of getting faster convergence rates,
computationally efficient algorithms in this model had remained elusive. Our
work provides the first evidence that one can indeed design algorithms
achieving arbitrarily small excess error in polynomial time under this
realistic noise model and thus opens up a new and exciting line of research.
We additionally provide lower bounds showing that popular algorithms such as
hinge loss minimization and averaging cannot lead to arbitrarily small excess
error under Massart noise, even under the uniform distribution. Our work
instead, makes use of a margin based technique developed in the context of
active learning. As a result, our algorithm is also an active learning
algorithm with label complexity that is only a logarithmic the desired excess
error
What Can We Learn Privately?
Learning problems form an important category of computational tasks that
generalizes many of the computations researchers apply to large real-life data
sets. We ask: what concept classes can be learned privately, namely, by an
algorithm whose output does not depend too heavily on any one input or specific
training example? More precisely, we investigate learning algorithms that
satisfy differential privacy, a notion that provides strong confidentiality
guarantees in contexts where aggregate information is released about a database
containing sensitive information about individuals. We demonstrate that,
ignoring computational constraints, it is possible to privately agnostically
learn any concept class using a sample size approximately logarithmic in the
cardinality of the concept class. Therefore, almost anything learnable is
learnable privately: specifically, if a concept class is learnable by a
(non-private) algorithm with polynomial sample complexity and output size, then
it can be learned privately using a polynomial number of samples. We also
present a computationally efficient private PAC learner for the class of parity
functions. Local (or randomized response) algorithms are a practical class of
private algorithms that have received extensive investigation. We provide a
precise characterization of local private learning algorithms. We show that a
concept class is learnable by a local algorithm if and only if it is learnable
in the statistical query (SQ) model. Finally, we present a separation between
the power of interactive and noninteractive local learning algorithms.Comment: 35 pages, 2 figure
Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations
Consider the following heuristic for building a decision tree for a function
. Place the most influential variable of
at the root, and recurse on the subfunctions and on the
left and right subtrees respectively; terminate once the tree is an
-approximation of . We analyze the quality of this heuristic,
obtaining near-matching upper and lower bounds:
Upper bound: For every with decision tree size and every
, this heuristic builds a decision tree of size
at most .
Lower bound: For every and , there is an with decision tree size such that
this heuristic builds a decision tree of size .
We also obtain upper and lower bounds for monotone functions:
and
respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004)
and Lee (2009).
Our upper bounds yield new algorithms for properly learning decision trees
under the uniform distribution. We show that these algorithms---which are
motivated by widely employed and empirically successful top-down decision tree
learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees
that compare favorably with those of the current fastest algorithm (Ehrenfeucht
and Haussler, 1989). Our lower bounds shed new light on the limitations of
these heuristics.
Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend
it to give the first uniform-distribution proper learning algorithm that
achieves polynomial sample and memory complexity, while matching its
state-of-the-art quasipolynomial runtime
Active classification with comparison queries
We study an extension of active learning in which the learning algorithm may
ask the annotator to compare the distances of two examples from the boundary of
their label-class. For example, in a recommendation system application (say for
restaurants), the annotator may be asked whether she liked or disliked a
specific restaurant (a label query); or which one of two restaurants did she
like more (a comparison query).
We focus on the class of half spaces, and show that under natural
assumptions, such as large margin or bounded bit-description of the input
examples, it is possible to reveal all the labels of a sample of size using
approximately queries. This implies an exponential improvement over
classical active learning, where only label queries are allowed. We complement
these results by showing that if any of these assumptions is removed then, in
the worst case, queries are required.
Our results follow from a new general framework of active learning with
additional queries. We identify a combinatorial dimension, called the
\emph{inference dimension}, that captures the query complexity when each
additional query is determined by examples (such as comparison queries,
each of which is determined by the two compared examples). Our results for half
spaces follow by bounding the inference dimension in the cases discussed above.Comment: 23 pages (not including references), 1 figure. The new version
contains a minor fix in the proof of Lemma 4.
- …