85 research outputs found
Universum Prescription: Regularization using Unlabeled Data
This paper shows that simply prescribing "none of the above" labels to
unlabeled data has a beneficial regularization effect to supervised learning.
We call it universum prescription by the fact that the prescribed labels cannot
be one of the supervised labels. In spite of its simplicity, universum
prescription obtained competitive results in training deep convolutional
networks for CIFAR-10, CIFAR-100, STL-10 and ImageNet datasets. A qualitative
justification of these approaches using Rademacher complexity is presented. The
effect of a regularization parameter -- probability of sampling from unlabeled
data -- is also studied empirically.Comment: 7 pages for article, 3 pages for supplemental material. To appear in
AAAI-1
Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms
This paper studies the generalization performance of multi-class
classification algorithms, for which we obtain, for the first time, a
data-dependent generalization error bound with a logarithmic dependence on the
class size, substantially improving the state-of-the-art linear dependence in
the existing data-dependent generalization analysis. The theoretical analysis
motivates us to introduce a new multi-class classification machine based on
-norm regularization, where the parameter controls the complexity
of the corresponding bounds. We derive an efficient optimization algorithm
based on Fenchel duality theory. Benchmarks on several real-world datasets show
that the proposed algorithm can achieve significant accuracy gains over the
state of the art
Human activity recognition on smartphones for mobile context awareness
Activity-Based Computing [1] aims to capture the state of the user and its environment
by exploiting heterogeneous sensors in order to provide adaptation to
exogenous computing resources. When these sensors are attached to the subject’s
body, they permit continuous monitoring of numerous physiological signals. This
has appealing use in healthcare applications, e.g. the exploitation of Ambient Intelligence
(AmI) in daily activity monitoring for elderly people. In this paper,
we present a system for human physical Activity Recognition (AR) using smartphone
inertial sensors. As these mobile phones are limited in terms of energy and
computing power, we propose a novel hardware-friendly approach for multiclass
classification. This method adapts the standard Support Vector Machine (SVM)
and exploits fixed-point arithmetic. In addition to the clear computational advantages
of fixed-point arithmetic, it is easy to show the regularization effect of the
number of bits and then the connections with the Statistical Learning Theory. A
comparison with the traditional SVM shows a significant improvement in terms
of computational costs while maintaining similar accuracy, which can contribute
to develop more sustainable systems for AmI.Peer ReviewedPostprint (published version
Classification with Asymmetric Label Noise: Consistency and Maximal Denoising
In many real-world classification problems, the labels of training examples
are randomly corrupted. Most previous theoretical work on classification with
label noise assumes that the two classes are separable, that the label noise is
independent of the true class label, or that the noise proportions for each
class are known. In this work, we give conditions that are necessary and
sufficient for the true class-conditional distributions to be identifiable.
These conditions are weaker than those analyzed previously, and allow for the
classes to be nonseparable and the noise levels to be asymmetric and unknown.
The conditions essentially state that a majority of the observed labels are
correct and that the true class-conditional distributions are "mutually
irreducible," a concept we introduce that limits the similarity of the two
distributions. For any label noise problem, there is a unique pair of true
class-conditional distributions satisfying the proposed conditions, and we
argue that this pair corresponds in a certain sense to maximal denoising of the
observed distributions.
Our results are facilitated by a connection to "mixture proportion
estimation," which is the problem of estimating the maximal proportion of one
distribution that is present in another. We establish a novel rate of
convergence result for mixture proportion estimation, and apply this to obtain
consistency of a discrimination rule based on surrogate loss minimization.
Experimental results on benchmark data and a nuclear particle classification
problem demonstrate the efficacy of our approach
Scalable Greedy Algorithms for Transfer Learning
In this paper we consider the binary transfer learning problem, focusing on
how to select and combine sources from a large pool to yield a good performance
on a target task. Constraining our scenario to real world, we do not assume the
direct access to the source data, but rather we employ the source hypotheses
trained from them. We propose an efficient algorithm that selects relevant
source hypotheses and feature dimensions simultaneously, building on the
literature on the best subset selection problem. Our algorithm achieves
state-of-the-art results on three computer vision datasets, substantially
outperforming both transfer learning and popular feature selection baselines in
a small-sample setting. We also present a randomized variant that achieves the
same results with the computational cost independent from the number of source
hypotheses and feature dimensions. Also, we theoretically prove that, under
reasonable assumptions on the source hypotheses, our algorithm can learn
effectively from few examples
Robustness and Regularization of Support Vector Machines
We consider regularized support vector machines (SVMs) and show that they are
precisely equivalent to a new robust optimization formulation. We show that
this equivalence of robust optimization and regularization has implications for
both algorithms, and analysis. In terms of algorithms, the equivalence suggests
more general SVM-like algorithms for classification that explicitly build in
protection to noise, and at the same time control overfitting. On the analysis
front, the equivalence of robustness and regularization, provides a robust
optimization interpretation for the success of regularized SVMs. We use the
this new robustness interpretation of SVMs to give a new proof of consistency
of (kernelized) SVMs, thus establishing robustness as the reason regularized
SVMs generalize well
- …