145 research outputs found
Supervised Learning with Similarity Functions
We address the problem of general supervised learning when data can only be
accessed through an (indefinite) similarity function between data points.
Existing work on learning with indefinite kernels has concentrated solely on
binary/multi-class classification problems. We propose a model that is generic
enough to handle any supervised learning task and also subsumes the model
previously proposed for classification. We give a "goodness" criterion for
similarity functions w.r.t. a given supervised learning task and then adapt a
well-known landmarking technique to provide efficient algorithms for supervised
learning using "good" similarity functions. We demonstrate the effectiveness of
our model on three important super-vised learning problems: a) real-valued
regression, b) ordinal regression and c) ranking where we show that our method
guarantees bounded generalization error. Furthermore, for the case of
real-valued regression, we give a natural goodness definition that, when used
in conjunction with a recent result in sparse vector recovery, guarantees a
sparse predictor with bounded generalization error. Finally, we report results
of our learning algorithms on regression and ordinal regression tasks using
non-PSD similarity functions and demonstrate the effectiveness of our
algorithms, especially that of the sparse landmark selection algorithm that
achieves significantly higher accuracies than the baseline methods while
offering reduced computational costs.Comment: To appear in the proceedings of NIPS 2012, 30 page
Improved Subsampled Randomized Hadamard Transform for Linear SVM
Subsampled Randomized Hadamard Transform (SRHT), a popular random projection
method that can efficiently project a -dimensional data into -dimensional
space () in time, has been widely used to address the
challenge of high-dimensionality in machine learning. SRHT works by rotating
the input data matrix by Randomized
Walsh-Hadamard Transform followed with a subsequent uniform column sampling on
the rotated matrix. Despite the advantages of SRHT, one limitation of SRHT is
that it generates the new low-dimensional embedding without considering any
specific properties of a given dataset. Therefore, this data-independent random
projection method may result in inferior and unstable performance when used for
a particular machine learning task, e.g., classification. To overcome this
limitation, we analyze the effect of using SRHT for random projection in the
context of linear SVM classification. Based on our analysis, we propose
importance sampling and deterministic top- sampling to produce effective
low-dimensional embedding instead of uniform sampling SRHT. In addition, we
also proposed a new supervised non-uniform sampling method. Our experimental
results have demonstrated that our proposed methods can achieve higher
classification accuracies than SRHT and other random projection methods on six
real-life datasets.Comment: AAAI-2
A Modern Introduction to Online Learning
In this monograph, I introduce the basic concepts of Online Learning through
a modern view of Online Convex Optimization. Here, online learning refers to
the framework of regret minimization under worst-case assumptions. I present
first-order and second-order algorithms for online learning with convex losses,
in Euclidean and non-Euclidean settings. All the algorithms are clearly
presented as instantiation of Online Mirror Descent or
Follow-The-Regularized-Leader and their variants. Particular attention is given
to the issue of tuning the parameters of the algorithms and learning in
unbounded domains, through adaptive and parameter-free online learning
algorithms. Non-convex losses are dealt through convex surrogate losses and
through randomization. The bandit setting is also briefly discussed, touching
on the problem of adversarial and stochastic multi-armed bandits. These notes
do not require prior knowledge of convex analysis and all the required
mathematical tools are rigorously explained. Moreover, all the proofs have been
carefully chosen to be as simple and as short as possible.Comment: Fixed more typos, added more history bits, added local norms bounds
for OMD and FTR
Exploiting Smoothness in Statistical Learning, Sequential Prediction, and Stochastic Optimization
In the last several years, the intimate connection between convex
optimization and learning problems, in both statistical and sequential
frameworks, has shifted the focus of algorithmic machine learning to examine
this interplay. In particular, on one hand, this intertwinement brings forward
new challenges in reassessment of the performance of learning algorithms
including generalization and regret bounds under the assumptions imposed by
convexity such as analytical properties of loss functions (e.g., Lipschitzness,
strong convexity, and smoothness). On the other hand, emergence of datasets of
an unprecedented size, demands the development of novel and more efficient
optimization algorithms to tackle large-scale learning problems.
The overarching goal of this thesis is to reassess the smoothness of loss
functions in statistical learning, sequential prediction/online learning, and
stochastic optimization and explicate its consequences. In particular we
examine how smoothness of loss function could be beneficial or detrimental in
these settings in terms of sample complexity, statistical consistency, regret
analysis, and convergence rate, and investigate how smoothness can be leveraged
to devise more efficient learning algorithms.Comment: Ph.D. Thesi
- β¦