1,721 research outputs found
Laplacian Support Vector Machines Trained in the Primal
In the last few years, due to the growing ubiquity of unlabeled data, much
effort has been spent by the machine learning community to develop better
understanding and improve the quality of classifiers exploiting unlabeled data.
Following the manifold regularization approach, Laplacian Support Vector
Machines (LapSVMs) have shown the state of the art performance in
semi--supervised classification. In this paper we present two strategies to
solve the primal LapSVM problem, in order to overcome some issues of the
original dual formulation. Whereas training a LapSVM in the dual requires two
steps, using the primal form allows us to collapse training to a single step.
Moreover, the computational complexity of the training algorithm is reduced
from O(n^3) to O(n^2) using preconditioned conjugate gradient, where n is the
combined number of labeled and unlabeled examples. We speed up training by
using an early stopping strategy based on the prediction on unlabeled data or,
if available, on labeled validation examples. This allows the algorithm to
quickly compute approximate solutions with roughly the same classification
accuracy as the optimal ones, considerably reducing the training time. Due to
its simplicity, training LapSVM in the primal can be the starting point for
additional enhancements of the original LapSVM formulation, such as those for
dealing with large datasets. We present an extensive experimental evaluation on
real world data showing the benefits of the proposed approach.Comment: 39 pages, 14 figure
Dual SVM Training on a Budget
We present a dual subspace ascent algorithm for support vector machine
training that respects a budget constraint limiting the number of support
vectors. Budget methods are effective for reducing the training time of kernel
SVM while retaining high accuracy. To date, budget training is available only
for primal (SGD-based) solvers. Dual subspace ascent methods like sequential
minimal optimization are attractive for their good adaptation to the problem
structure, their fast convergence rate, and their practical speed. By
incorporating a budget constraint into a dual algorithm, our method enjoys the
best of both worlds. We demonstrate considerable speed-ups over primal budget
training methods
An Efficient Primal-Dual Prox Method for Non-Smooth Optimization
We study the non-smooth optimization problems in machine learning, where both
the loss function and the regularizer are non-smooth functions. Previous
studies on efficient empirical loss minimization assume either a smooth loss
function or a strongly convex regularizer, making them unsuitable for
non-smooth optimization. We develop a simple yet efficient method for a family
of non-smooth optimization problems where the dual form of the loss function is
bilinear in primal and dual variables. We cast a non-smooth optimization
problem into a minimax optimization problem, and develop a primal dual prox
method that solves the minimax optimization problem at a rate of
{assuming that the proximal step can be efficiently solved}, significantly
faster than a standard subgradient descent method that has an
convergence rate. Our empirical study verifies the efficiency of the proposed
method for various non-smooth optimization problems that arise ubiquitously in
machine learning by comparing it to the state-of-the-art first order methods
Towards Ultrahigh Dimensional Feature Selection for Big Data
In this paper, we present a new adaptive feature scaling scheme for
ultrahigh-dimensional feature selection on Big Data. To solve this problem
effectively, we first reformulate it as a convex semi-infinite programming
(SIP) problem and then propose an efficient \emph{feature generating paradigm}.
In contrast with traditional gradient-based approaches that conduct
optimization on all input features, the proposed method iteratively activates a
group of features and solves a sequence of multiple kernel learning (MKL)
subproblems of much reduced scale. To further speed up the training, we propose
to solve the MKL subproblems in their primal forms through a modified
accelerated proximal gradient approach. Due to such an optimization scheme,
some efficient cache techniques are also developed. The feature generating
paradigm can guarantee that the solution converges globally under mild
conditions and achieve lower feature selection bias. Moreover, the proposed
method can tackle two challenging tasks in feature selection: 1) group-based
feature selection with complex structures and 2) nonlinear feature selection
with explicit feature mappings. Comprehensive experiments on a wide range of
synthetic and real-world datasets containing tens of million data points with
features demonstrate the competitive performance of the proposed
method over state-of-the-art feature selection methods in terms of
generalization performance and training efficiency.Comment: 61 page
Classification of Diabetes Mellitus using Modified Particle Swarm Optimization and Least Squares Support Vector Machine
Diabetes Mellitus is a major health problem all over the world. Many
classification algorithms have been applied for its diagnoses and treatment. In
this paper, a hybrid algorithm of Modified-Particle Swarm Optimization and
Least Squares- Support Vector Machine is proposed for the classification of
type II DM patients. LS-SVM algorithm is used for classification by finding
optimal hyper-plane which separates various classes. Since LS-SVM is so
sensitive to the changes of its parameter values, Modified-PSO algorithm is
used as an optimization technique for LS-SVM parameters. This will Guarantee
the robustness of the hybrid algorithm by searching for the optimal values for
LS-SVM parameters. The pro-posed Algorithm is implemented and evaluated using
Pima Indians Diabetes Data set from UCI repository of machine learning
databases. It is also compared with different classifier algorithms which were
applied on the same database. The experimental results showed the superiority
of the proposed algorithm which could achieve an average classification
accuracy of 97.833%
Max-Margin Feature Selection
Many machine learning applications such as in vision, biology and social
networking deal with data in high dimensions. Feature selection is typically
employed to select a subset of features which im- proves generalization
accuracy as well as reduces the computational cost of learning the model. One
of the criteria used for feature selection is to jointly minimize the
redundancy and maximize the rele- vance of the selected features. In this
paper, we formulate the task of feature selection as a one class SVM problem in
a space where features correspond to the data points and instances correspond
to the dimensions. The goal is to look for a representative subset of the
features (support vectors) which describes the boundary for the region where
the set of the features (data points) exists. This leads to a joint
optimization of relevance and redundancy in a principled max-margin framework.
Additionally, our formulation enables us to leverage existing techniques for
optimizing the SVM objective resulting in highly computationally efficient
solutions for the task of feature selection. Specifically, we employ the dual
coordinate descent algorithm (Hsieh et al., 2008), originally proposed for
SVMs, for our formulation. We use a sparse representation to deal with data in
very high dimensions. Experiments on seven publicly available benchmark
datasets from a variety of domains show that our approach results in orders of
magnitude faster solutions even while retaining the same level of accuracy
compared to the state of the art feature selection techniques.Comment: submitted to PR Letter
Componentwise Least Squares Support Vector Machines
This chapter describes componentwise Least Squares Support Vector Machines
(LS-SVMs) for the estimation of additive models consisting of a sum of
nonlinear components. The primal-dual derivations characterizing LS-SVMs for
the estimation of the additive model result in a single set of linear equations
with size growing in the number of data-points. The derivation is elaborated
for the classification as well as the regression case. Furthermore, different
techniques are proposed to discover structure in the data by looking for sparse
components in the model based on dedicated regularization schemes on the one
hand and fusion of the componentwise LS-SVMs training with a validation
criterion on the other hand. (keywords: LS-SVMs, additive models,
regularization, structure detection)Comment: 22 pages. Accepted for publication in Support Vector Machines: Theory
and Applications, ed. L. Wang, 200
Managing Randomization in the Multi-Block Alternating Direction Method of Multipliers for Quadratic Optimization
The Alternating Direction Method of Multipliers (ADMM) has gained a lot of
attention for solving large-scale and objective-separable constrained
optimization. However, the two-block variable structure of the ADMM still
limits the practical computational efficiency of the method, because one big
matrix factorization is needed at least once even for linear and convex
quadratic programming. This drawback may be overcome by enforcing a multi-block
structure of the decision variables in the original optimization problem.
Unfortunately, the multi-block ADMM, with more than two blocks, is not
guaranteed to be convergent. On the other hand, two positive developments have
been made: first, if in each cyclic loop one randomly permutes the updating
order of the multiple blocks, then the method converges in expectation for
solving any system of linear equations with any number of blocks. Secondly,
such a randomly permuted ADMM also works for equality-constrained convex
quadratic programming even when the objective function is not separable. The
goal of this paper is twofold. First, we add more randomness into the ADMM by
developing a randomly assembled cyclic ADMM (RAC-ADMM) where the decision
variables in each block are randomly assembled. We discuss the theoretical
properties of RAC-ADMM and show when random assembling helps and when it hurts,
and develop a criterion to guarantee that it converges almost surely. Secondly,
using the theoretical guidance on RAC-ADMM, we conduct multiple numerical tests
on solving both randomly generated and large-scale benchmark quadratic
optimization problems, which include continuous, and binary graph-partition and
quadratic assignment, and selected machine learning problems. Our numerical
tests show that the RAC-ADMM, with a variable-grouping strategy, could
significantly improve the computation efficiency on solving most quadratic
optimization problems.Comment: Expanded and streamlined theoretical sections. Added comparisons with
other multi-block ADMM variants. Updated Computational Studies Section on
continuous problems -- reporting primal and dual residuals instead of
objective value gap. Added selected machine learning problems
(ElasticNet/Lasso and Support Vector Machine) to Computational Studies
Sectio
Convex Optimization for Binary Classifier Aggregation in Multiclass Problems
Multiclass problems are often decomposed into multiple binary problems that
are solved by individual binary classifiers whose results are integrated into a
final answer. Various methods, including all-pairs (APs), one-versus-all (OVA),
and error correcting output code (ECOC), have been studied, to decompose
multiclass problems into binary problems. However, little study has been made
to optimally aggregate binary problems to determine a final answer to the
multiclass problem. In this paper we present a convex optimization method for
an optimal aggregation of binary classifiers to estimate class membership
probabilities in multiclass problems. We model the class membership probability
as a softmax function which takes a conic combination of discrepancies induced
by individual binary classifiers, as an input. With this model, we formulate
the regularized maximum likelihood estimation as a convex optimization problem,
which is solved by the primal-dual interior point method. Connections of our
method to large margin classifiers are presented, showing that the large margin
formulation can be considered as a limiting case of our convex formulation.
Numerical experiments on synthetic and real-world data sets demonstrate that
our method outperforms existing aggregation methods as well as direct methods,
in terms of the classification accuracy and the quality of class membership
probability estimates.Comment: Appeared in Proceedings of the 2014 SIAM International Conference on
Data Mining (SDM 2014
Primal-Dual Rates and Certificates
We propose an algorithm-independent framework to equip existing optimization
methods with primal-dual certificates. Such certificates and corresponding rate
of convergence guarantees are important for practitioners to diagnose progress,
in particular in machine learning applications. We obtain new primal-dual
convergence rates, e.g., for the Lasso as well as many L1, Elastic Net, group
Lasso and TV-regularized problems. The theory applies to any norm-regularized
generalized linear model. Our approach provides efficiently computable duality
gaps which are globally defined, without modifying the original problems in the
region of interest.Comment: appearing at ICML 2016 - Proceedings of the 33rd International
Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 4
- …