2,375 research outputs found
Multi-view Metric Learning in Vector-valued Kernel Spaces
We consider the problem of metric learning for multi-view data and present a
novel method for learning within-view as well as between-view metrics in
vector-valued kernel spaces, as a way to capture multi-modal structure of the
data. We formulate two convex optimization problems to jointly learn the metric
and the classifier or regressor in kernel feature spaces. An iterative
three-step multi-view metric learning algorithm is derived from the
optimization problems. In order to scale the computation to large training
sets, a block-wise Nystr{\"o}m approximation of the multi-view kernel matrix is
introduced. We justify our approach theoretically and experimentally, and show
its performance on real-world datasets against relevant state-of-the-art
methods
Two view learning: SVM-2K, theory and practice
Kernel methods make it relatively easy to define complex highdimensional
feature spaces. This raises the question of how we can
identify the relevant subspaces for a particular learning task. When two
views of the same phenomenon are available kernel Canonical Correlation
Analysis (KCCA) has been shown to be an effective preprocessing
step that can improve the performance of classification algorithms such
as the Support Vector Machine (SVM). This paper takes this observation
to its logical conclusion and proposes a method that combines this
two stage learning (KCCA followed by SVM) into a single optimisation
termed SVM-2K. We present both experimental and theoretical analysis
of the approach showing encouraging results and insights
Online Learning: Beyond Regret
We study online learnability of a wide class of problems, extending the
results of (Rakhlin, Sridharan, Tewari, 2010) to general notions of performance
measure well beyond external regret. Our framework simultaneously captures such
well-known notions as internal and general Phi-regret, learning with
non-additive global cost functions, Blackwell's approachability, calibration of
forecasters, adaptive regret, and more. We show that learnability in all these
situations is due to control of the same three quantities: a martingale
convergence term, a term describing the ability to perform well if future is
known, and a generalization of sequential Rademacher complexity, studied in
(Rakhlin, Sridharan, Tewari, 2010). Since we directly study complexity of the
problem instead of focusing on efficient algorithms, we are able to improve and
extend many known results which have been previously derived via an algorithmic
construction
Spectrally-normalized margin bounds for neural networks
This paper presents a margin-based multiclass generalization bound for neural
networks that scales with their margin-normalized "spectral complexity": their
Lipschitz constant, meaning the product of the spectral norms of the weight
matrices, times a certain correction factor. This bound is empirically
investigated for a standard AlexNet network trained with SGD on the mnist and
cifar10 datasets, with both original and random labels; the bound, the
Lipschitz constants, and the excess risks are all in direct correlation,
suggesting both that SGD selects predictors whose complexity scales with the
difficulty of the learning task, and secondly that the presented bound is
sensitive to this complexity.Comment: Comparison to arXiv v1: 1-norm in main bound refined to
(2,1)-group-norm. Comparison to NIPS camera ready: typo fixe
- …