12,896 research outputs found
Noncrossing Ordinal Classification
Ordinal data are often seen in real applications. Regular multicategory
classification methods are not designed for this data type and a more proper
treatment is needed. We consider a framework of ordinal classification which
pools the results from binary classifiers together. An inherent difficulty of
this framework is that the class prediction can be ambiguous due to boundary
crossing. To fix this issue, we propose a noncrossing ordinal classification
method which materializes the framework by imposing noncrossing constraints. An
asymptotic study of the proposed method is conducted. We show by simulated and
data examples that the proposed method can improve the classification
performance for ordinal data without the ambiguity caused by boundary
crossings.Comment: 32 pages, 9 figures. Accepted for Publication in Statistics and Its
Interfac
A distributed block coordinate descent method for training regularized linear classifiers
Distributed training of regularized classifiers has received great
attention recently. Most existing methods approach this problem by taking steps
obtained from approximating the objective by a quadratic approximation that is
decoupled at the individual variable level. These methods are designed for
multicore and MPI platforms where communication costs are low. They are
inefficient on systems such as Hadoop running on a cluster of commodity
machines where communication costs are substantial. In this paper we design a
distributed algorithm for regularization that is much better suited for
such systems than existing algorithms. A careful cost analysis is used to
support these points and motivate our method. The main idea of our algorithm is
to do block optimization of many variables on the actual objective function
within each computing node; this increases the computational cost per step that
is matched with the communication cost, and decreases the number of outer
iterations, thus yielding a faster overall method. Distributed Gauss-Seidel and
Gauss-Southwell greedy schemes are used for choosing variables to update in
each step. We establish global convergence theory for our algorithm, including
Q-linear rate of convergence. Experiments on two benchmark problems show our
method to be much faster than existing methods
Enhancing Multi-Class Classification of Random Forest using Random Vector Functional Neural Network and Oblique Decision Surfaces
Both neural networks and decision trees are popular machine learning methods
and are widely used to solve problems from diverse domains. These two
classifiers are commonly used base classifiers in an ensemble framework. In
this paper, we first present a new variant of oblique decision tree based on a
linear classifier, then construct an ensemble classifier based on the fusion of
a fast neural network, random vector functional link network and oblique
decision trees. Random Vector Functional Link Network has an elegant closed
form solution with extremely short training time. The neural network partitions
each training bag (obtained using bagging) at the root level into C subsets
where C is the number of classes in the dataset and subsequently, C oblique
decision trees are trained on such partitions. The proposed method provides a
rich insight into the data by grouping the confusing or hard to classify
samples for each class and thus, provides an opportunity to employ fine-grained
classification rule over the data. The performance of the ensemble classifier
is evaluated on several multi-class datasets where it demonstrates a superior
performance compared to other state-of- the-art classifiers.Comment: 8 pages, 5 figure
Asymptotic distribution and sparsistency for l1-penalized parametric M-estimators with applications to linear SVM and logistic regression
Since its early use in least squares regression problems, the l1-penalization
framework for variable selection has been employed in conjunction with a wide
range of loss functions encompassing regression, classification and survival
analysis. While a well developed theory exists for the l1-penalized least
squares estimates, few results concern the behavior of l1-penalized estimates
for general loss functions. In this paper, we derive two results concerning
penalized estimates for a wide array of penalty and loss functions. Our first
result characterizes the asymptotic distribution of penalized parametric
M-estimators under mild conditions on the loss and penalty functions in the
classical setting (fixed-p-large-n). Our second result explicits necessary and
sufficient generalized irrepresentability (GI) conditions for l1-penalized
parametric M-estimates to consistently select the components of a model
(sparsistency) as well as their sign (sign consistency). In general, the GI
conditions depend on the Hessian of the risk function at the true value of the
unknown parameter. Under Gaussian predictors, we obtain a set of conditions
under which the GI conditions can be re-expressed solely in terms of the second
moment of the predictors. We apply our theory to contrast l1-penalized SVM and
logistic regression classifiers and find conditions under which they have the
same behavior in terms of their model selection consistency (sparsistency and
sign consistency). Finally, we provide simulation evidence for the theory based
on these classification examples.Comment: 55 pages, 4 figures, also available as a technical report from the
Statistics Department at Indiana Universit
Iteratively-Reweighted Least-Squares Fitting of Support Vector Machines: A Majorization--Minimization Algorithm Approach
Support vector machines (SVMs) are an important tool in modern data analysis.
Traditionally, support vector machines have been fitted via quadratic
programming, either using purpose-built or off-the-shelf algorithms. We present
an alternative approach to SVM fitting via the majorization--minimization (MM)
paradigm. Algorithms that are derived via MM algorithm constructions can be
shown to monotonically decrease their objectives at each iteration, as well as
be globally convergent to stationary points. We demonstrate the construction of
iteratively-reweighted least-squares (IRLS) algorithms, via the MM paradigm,
for SVM risk minimization problems involving the hinge, least-square,
squared-hinge, and logistic losses, and 1-norm, 2-norm, and elastic net
penalizations. Successful implementations of our algorithms are presented via
some numerical examples
Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms
Many different machine learning algorithms exist; taking into account each
algorithm's hyperparameters, there is a staggeringly large number of possible
alternatives overall. We consider the problem of simultaneously selecting a
learning algorithm and setting its hyperparameters, going beyond previous work
that addresses these issues in isolation. We show that this problem can be
addressed by a fully automated approach, leveraging recent innovations in
Bayesian optimization. Specifically, we consider a wide range of feature
selection techniques (combining 3 search and 8 evaluator methods) and all
classification approaches implemented in WEKA, spanning 2 ensemble methods, 10
meta-methods, 27 base classifiers, and hyperparameter settings for each
classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup
09, variants of the MNIST dataset and CIFAR-10, we show classification
performance often much better than using standard selection/hyperparameter
optimization methods. We hope that our approach will help non-expert users to
more effectively identify machine learning algorithms and hyperparameter
settings appropriate to their applications, and hence to achieve improved
performance.Comment: 9 pages, 3 figure
Ensembles of Deep LSTM Learners for Activity Recognition using Wearables
Recently, deep learning (DL) methods have been introduced very successfully
into human activity recognition (HAR) scenarios in ubiquitous and wearable
computing. Especially the prospect of overcoming the need for manual feature
design combined with superior classification capabilities render deep neural
networks very attractive for real-life HAR application. Even though DL-based
approaches now outperform the state-of-the-art in a number of recognitions
tasks of the field, yet substantial challenges remain. Most prominently, issues
with real-life datasets, typically including imbalanced datasets and
problematic data quality, still limit the effectiveness of activity recognition
using wearables. In this paper we tackle such challenges through Ensembles of
deep Long Short Term Memory (LSTM) networks. We have developed modified
training procedures for LSTM networks and combine sets of diverse LSTM learners
into classifier collectives. We demonstrate, both formally and empirically,
that Ensembles of deep LSTM learners outperform the individual LSTM networks.
Through an extensive experimental evaluation on three standard benchmarks
(Opportunity, PAMAP2, Skoda) we demonstrate the excellent recognition
capabilities of our approach and its potential for real-life applications of
human activity recognition.Comment: accepted for publication in ACM IMWUT (Ubicomp) 201
An Optimization Framework for Semi-Supervised and Transfer Learning using Multiple Classifiers and Clusterers
Unsupervised models can provide supplementary soft constraints to help
classify new, "target" data since similar instances in the target set are more
likely to share the same class label. Such models can also help detect possible
differences between training and target distributions, which is useful in
applications where concept drift may take place, as in transfer learning
settings. This paper describes a general optimization framework that takes as
input class membership estimates from existing classifiers learnt on previously
encountered "source" data, as well as a similarity matrix from a cluster
ensemble operating solely on the target data to be classified, and yields a
consensus labeling of the target data. This framework admits a wide range of
loss functions and classification/clustering methods. It exploits properties of
Bregman divergences in conjunction with Legendre duality to yield a principled
and scalable approach. A variety of experiments show that the proposed
framework can yield results substantially superior to those provided by popular
transductive learning techniques or by naively applying classifiers learnt on
the original task to the target data
Monotonic Calibrated Interpolated Look-Up Tables
Real-world machine learning applications may require functions that are
fast-to-evaluate and interpretable. In particular, guaranteed monotonicity of
the learned function can be critical to user trust. We propose meeting these
goals for low-dimensional machine learning problems by learning flexible,
monotonic functions using calibrated interpolated look-up tables. We extend the
structural risk minimization framework of lattice regression to train monotonic
look-up tables by solving a convex problem with appropriate linear inequality
constraints. In addition, we propose jointly learning interpretable
calibrations of each feature to normalize continuous features and handle
categorical or missing data, at the cost of making the objective non-convex. We
address large-scale learning through parallelization, mini-batching, and
propose random sampling of additive regularizer terms. Case studies with
real-world problems with five to sixteen features and thousands to millions of
training samples demonstrate the proposed monotonic functions can achieve
state-of-the-art accuracy on practical problems while providing greater
transparency to users.Comment: To appear (with minor revisions), Journal Machine Learning Research
201
Classification for Dynamical Systems: Model-based Approach and Support Vector Machines
We consider the problem of classifying trajectories generated by dynamical
systems. We investigate a model-based approach, the common approach in control
engineering, and a data-driven approach based on Support Vector Machines, a
popular method in the area of machine learning. The analysis points out
connections between the two approaches and their relative merits
- …