9,903 research outputs found
L-SVMs: Landmarks-based Linear Local Support Vectors Machines
For their ability to capture non-linearities in the data and to scale to
large training sets, local Support Vector Machines (SVMs) have received a
special attention during the past decade. In this paper, we introduce a new
local SVM method, called L-SVMs, which clusters the input space, carries
out dimensionality reduction by projecting the data on landmarks, and jointly
learns a linear combination of local models. Simple and effective, our
algorithm is also theoretically well-founded. Using the framework of Uniform
Stability, we show that our SVM formulation comes with generalization
guarantees on the true risk. The experiments based on the simplest
configuration of our model (i.e. landmarks randomly selected, linear
projection, linear kernel) show that L-SVMs is very competitive w.r.t. the
state of the art and opens the door to new exciting lines of research
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Monotonic Calibrated Interpolated Look-Up Tables
Real-world machine learning applications may require functions that are
fast-to-evaluate and interpretable. In particular, guaranteed monotonicity of
the learned function can be critical to user trust. We propose meeting these
goals for low-dimensional machine learning problems by learning flexible,
monotonic functions using calibrated interpolated look-up tables. We extend the
structural risk minimization framework of lattice regression to train monotonic
look-up tables by solving a convex problem with appropriate linear inequality
constraints. In addition, we propose jointly learning interpretable
calibrations of each feature to normalize continuous features and handle
categorical or missing data, at the cost of making the objective non-convex. We
address large-scale learning through parallelization, mini-batching, and
propose random sampling of additive regularizer terms. Case studies with
real-world problems with five to sixteen features and thousands to millions of
training samples demonstrate the proposed monotonic functions can achieve
state-of-the-art accuracy on practical problems while providing greater
transparency to users.Comment: To appear (with minor revisions), Journal Machine Learning Research
201
Deep Transductive Semi-supervised Maximum Margin Clustering
Semi-supervised clustering is an very important topic in machine learning and
computer vision. The key challenge of this problem is how to learn a metric,
such that the instances sharing the same label are more likely close to each
other on the embedded space. However, little attention has been paid to learn
better representations when the data lie on non-linear manifold. Fortunately,
deep learning has led to great success on feature learning recently. Inspired
by the advances of deep learning, we propose a deep transductive
semi-supervised maximum margin clustering approach. More specifically, given
pairwise constraints, we exploit both labeled and unlabeled data to learn a
non-linear mapping under maximum margin framework for clustering analysis.
Thus, our model unifies transductive learning, feature learning and maximum
margin techniques in the semi-supervised clustering framework. We pretrain the
deep network structure with restricted Boltzmann machines (RBMs) layer by layer
greedily, and optimize our objective function with gradient descent. By
checking the most violated constraints, our approach updates the model
parameters through error backpropagation, in which deep features are learned
automatically. The experimental results shows that our model is significantly
better than the state of the art on semi-supervised clustering.Comment: 1
A new boosting algorithm based on dual averaging scheme
The fields of machine learning and mathematical optimization increasingly
intertwined. The special topic on supervised learning and convex optimization
examines this interplay. The training part of most supervised learning
algorithms can usually be reduced to an optimization problem that minimizes a
loss between model predictions and training data. While most optimization
techniques focus on accuracy and speed of convergence, the qualities of good
optimization algorithm from the machine learning perspective can be quite
different since machine learning is more than fitting the data. Better
optimization algorithms that minimize the training loss can possibly give very
poor generalization performance. In this paper, we examine a particular kind of
machine learning algorithm, boosting, whose training process can be viewed as
functional coordinate descent on the exponential loss. We study the relation
between optimization techniques and machine learning by implementing a new
boosting algorithm. DABoost, based on dual-averaging scheme and study its
generalization performance. We show that DABoost, although slower in reducing
the training error, in general enjoys a better generalization error than
AdaBoost.Comment: 8 pages, 3 figure
Rapid Feature Learning with Stacked Linear Denoisers
We investigate unsupervised pre-training of deep architectures as feature
generators for "shallow" classifiers. Stacked Denoising Autoencoders (SdA),
when used as feature pre-processing tools for SVM classification, can lead to
significant improvements in accuracy - however, at the price of a substantial
increase in computational cost. In this paper we create a simple algorithm
which mimics the layer by layer training of SdAs. However, in contrast to SdAs,
our algorithm requires no training through gradient descent as the parameters
can be computed in closed-form. It can be implemented in less than 20 lines of
MATLABTMand reduces the computation time from several hours to mere seconds. We
show that our feature transformation reliably improves the results of SVM
classification significantly on all our data sets - often outperforming SdAs
and even deep neural networks in three out of four deep learning benchmarks.Comment: 10 page
Projectron -- A Shallow and Interpretable Network for Classifying Medical Images
This paper introduces the `Projectron' as a new neural network architecture
that uses Radon projections to both classify and represent medical images. The
motivation is to build shallow networks which are more interpretable in the
medical imaging domain. Radon transform is an established technique that can
reconstruct images from parallel projections. The Projectron first applies
global Radon transform to each image using equidistant angles and then feeds
these transformations for encoding to a single layer of neurons followed by a
layer of suitable kernels to facilitate a linear separation of projections.
Finally, the Projectron provides the output of the encoding as an input to two
more layers for final classification. We validate the Projectron on five
publicly available datasets, a general dataset (namely MNIST) and four medical
datasets (namely Emphysema, IDC, IRMA, and Pneumonia). The results are
encouraging as we compared the Projectron's performance against MLPs with raw
images and Radon projections as inputs, respectively. Experiments clearly
demonstrate the potential of the proposed Projectron for
representing/classifying medical images.Comment: Accepted for publication in the 2019 International Joint Conference
on Neural Networks (IJCNN), Budapest, Hungar
Multi-Task Kernel Null-Space for One-Class Classification
The one-class kernel spectral regression (OC-KSR), the regression-based
formulation of the kernel null-space approach has been found to be an effective
Fisher criterion-based methodology for one-class classification (OCC),
achieving state-of-the-art performance in one-class classification while
providing relatively high robustness against data corruption. This work extends
the OC-KSR methodology to a multi-task setting where multiple one-class
problems share information for improved performance. By viewing the multi-task
structure learning problem as one of compositional function learning, first,
the OC-KSR method is extended to learn multiple tasks' structure
\textit{linearly} by posing it as an instantiation of the separable kernel
learning problem in a vector-valued reproducing kernel Hilbert space where an
output kernel encodes tasks' structure while another kernel captures input
similarities. Next, a non-linear structure learning mechanism is proposed which
captures multiple tasks' relationships \textit{non-linearly} via an output
kernel. The non-linear structure learning method is then extended to a sparse
setting where different tasks compete in an output composition mechanism,
leading to a sparse non-linear structure among multiple problems. Through
extensive experiments on different data sets, the merits of the proposed
multi-task kernel null-space techniques are verified against the baseline as
well as other existing multi-task one-class learning techniques
Shared latent subspace modelling within Gaussian-Binary Restricted Boltzmann Machines for NIST i-Vector Challenge 2014
This paper presents a novel approach to speaker subspace modelling based on
Gaussian-Binary Restricted Boltzmann Machines (GRBM). The proposed model is
based on the idea of shared factors as in the Probabilistic Linear Discriminant
Analysis (PLDA). GRBM hidden layer is divided into speaker and channel factors,
herein the speaker factor is shared over all vectors of the speaker. Then
Maximum Likelihood Parameter Estimation (MLE) for proposed model is introduced.
Various new scoring techniques for speaker verification using GRBM are
proposed. The results for NIST i-vector Challenge 2014 dataset are presented.Comment: 5 pages, 3 figures, submitted to Interspeech 201
Max-Margin based Discriminative Feature Learning
In this paper, we propose a new max-margin based discriminative feature
learning method. Specifically, we aim at learning a low-dimensional feature
representation, so as to maximize the global margin of the data and make the
samples from the same class as close as possible. In order to enhance the
robustness to noise, a norm constraint is introduced to make the
transformation matrix in group sparsity. In addition, for multi-class
classification tasks, we further intend to learn and leverage the correlation
relationships among multiple class tasks for assisting in learning
discriminative features. The experimental results demonstrate the power of the
proposed method against the related state-of-the-art methods.Comment: Accepted by IEEE TNNL
- …