Search CORE

1,721 research outputs found

Laplacian Support Vector Machines Trained in the Primal

Author: Belkin Mikhail
Melacci Stefano
Publication venue
Publication date: 29/09/2009
Field of study

In the last few years, due to the growing ubiquity of unlabeled data, much effort has been spent by the machine learning community to develop better understanding and improve the quality of classifiers exploiting unlabeled data. Following the manifold regularization approach, Laplacian Support Vector Machines (LapSVMs) have shown the state of the art performance in semi--supervised classification. In this paper we present two strategies to solve the primal LapSVM problem, in order to overcome some issues of the original dual formulation. Whereas training a LapSVM in the dual requires two steps, using the primal form allows us to collapse training to a single step. Moreover, the computational complexity of the training algorithm is reduced from O(n^3) to O(n^2) using preconditioned conjugate gradient, where n is the combined number of labeled and unlabeled examples. We speed up training by using an early stopping strategy based on the prediction on unlabeled data or, if available, on labeled validation examples. This allows the algorithm to quickly compute approximate solutions with roughly the same classification accuracy as the optimal ones, considerably reducing the training time. Due to its simplicity, training LapSVM in the primal can be the starting point for additional enhancements of the original LapSVM formulation, such as those for dealing with large datasets. We present an extensive experimental evaluation on real world data showing the benefits of the proposed approach.Comment: 39 pages, 14 figure

arXiv.org e-Print Archive

Dual SVM Training on a Budget

Author: Glasmachers Tobias
Qaadan Sahar
Schüler Merlin
Publication venue
Publication date: 26/06/2018
Field of study

We present a dual subspace ascent algorithm for support vector machine training that respects a budget constraint limiting the number of support vectors. Budget methods are effective for reducing the training time of kernel SVM while retaining high accuracy. To date, budget training is available only for primal (SGD-based) solvers. Dual subspace ascent methods like sequential minimal optimization are attractive for their good adaptation to the problem structure, their fast convergence rate, and their practical speed. By incorporating a budget constraint into a dual algorithm, our method enjoys the best of both worlds. We demonstrate considerable speed-ups over primal budget training methods

arXiv.org e-Print Archive

An Efficient Primal-Dual Prox Method for Non-Smooth Optimization

Author: Jin Rong
Mahdavi Mehrdad
Yang Tianbao
Zhu Shenghuo
Publication venue
Publication date: 26/07/2013
Field of study

We study the non-smooth optimization problems in machine learning, where both the loss function and the regularizer are non-smooth functions. Previous studies on efficient empirical loss minimization assume either a smooth loss function or a strongly convex regularizer, making them unsuitable for non-smooth optimization. We develop a simple yet efficient method for a family of non-smooth optimization problems where the dual form of the loss function is bilinear in primal and dual variables. We cast a non-smooth optimization problem into a minimax optimization problem, and develop a primal dual prox method that solves the minimax optimization problem at a rate of

O(1/T)

{assuming that the proximal step can be efficiently solved}, significantly faster than a standard subgradient descent method that has an

O(1/\sqrt{T})

convergence rate. Our empirical study verifies the efficiency of the proposed method for various non-smooth optimization problems that arise ubiquitously in machine learning by comparing it to the state-of-the-art first order methods

arXiv.org e-Print Archive

CiteSeerX

Towards Ultrahigh Dimensional Feature Selection for Big Data

Author: Tan Mingkui
Tsang Ivor W.
Wang Li
Publication venue
Publication date: 15/12/2019
Field of study

In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data. To solve this problem effectively, we first reformulate it as a convex semi-infinite programming (SIP) problem and then propose an efficient \emph{feature generating paradigm}. In contrast with traditional gradient-based approaches that conduct optimization on all input features, the proposed method iteratively activates a group of features and solves a sequence of multiple kernel learning (MKL) subproblems of much reduced scale. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such an optimization scheme, some efficient cache techniques are also developed. The feature generating paradigm can guarantee that the solution converges globally under mild conditions and achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world datasets containing tens of million data points with

O(10^{14})

features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training efficiency.Comment: 61 page

arXiv.org e-Print Archive

Classification of Diabetes Mellitus using Modified Particle Swarm Optimization and Least Squares Support Vector Machine

Author: AboElhamd Eman
Soliman Omar S.
Publication venue
Publication date: 02/05/2014
Field of study

Diabetes Mellitus is a major health problem all over the world. Many classification algorithms have been applied for its diagnoses and treatment. In this paper, a hybrid algorithm of Modified-Particle Swarm Optimization and Least Squares- Support Vector Machine is proposed for the classification of type II DM patients. LS-SVM algorithm is used for classification by finding optimal hyper-plane which separates various classes. Since LS-SVM is so sensitive to the changes of its parameter values, Modified-PSO algorithm is used as an optimization technique for LS-SVM parameters. This will Guarantee the robustness of the hybrid algorithm by searching for the optimal values for LS-SVM parameters. The pro-posed Algorithm is implemented and evaluated using Pima Indians Diabetes Data set from UCI repository of machine learning databases. It is also compared with different classifier algorithms which were applied on the same database. The experimental results showed the superiority of the proposed algorithm which could achieve an average classification accuracy of 97.833%

arXiv.org e-Print Archive

Max-Margin Feature Selection

Author: Biswas K. K.
Khandelwal Dinesh
Prasad Yamuna
Publication venue
Publication date: 14/06/2016
Field of study

Many machine learning applications such as in vision, biology and social networking deal with data in high dimensions. Feature selection is typically employed to select a subset of features which im- proves generalization accuracy as well as reduces the computational cost of learning the model. One of the criteria used for feature selection is to jointly minimize the redundancy and maximize the rele- vance of the selected features. In this paper, we formulate the task of feature selection as a one class SVM problem in a space where features correspond to the data points and instances correspond to the dimensions. The goal is to look for a representative subset of the features (support vectors) which describes the boundary for the region where the set of the features (data points) exists. This leads to a joint optimization of relevance and redundancy in a principled max-margin framework. Additionally, our formulation enables us to leverage existing techniques for optimizing the SVM objective resulting in highly computationally efficient solutions for the task of feature selection. Specifically, we employ the dual coordinate descent algorithm (Hsieh et al., 2008), originally proposed for SVMs, for our formulation. We use a sparse representation to deal with data in very high dimensions. Experiments on seven publicly available benchmark datasets from a variety of domains show that our approach results in orders of magnitude faster solutions even while retaining the same level of accuracy compared to the state of the art feature selection techniques.Comment: submitted to PR Letter

arXiv.org e-Print Archive

Componentwise Least Squares Support Vector Machines

Author: De Brabanter Jos
De Moor Bart
Goethals Ivan
Pelckmans Kristiaan
Suykens Johan A. K.
Publication venue
Publication date: 19/04/2005
Field of study

This chapter describes componentwise Least Squares Support Vector Machines (LS-SVMs) for the estimation of additive models consisting of a sum of nonlinear components. The primal-dual derivations characterizing LS-SVMs for the estimation of the additive model result in a single set of linear equations with size growing in the number of data-points. The derivation is elaborated for the classification as well as the regression case. Furthermore, different techniques are proposed to discover structure in the data by looking for sparse components in the model based on dedicated regularization schemes on the one hand and fusion of the componentwise LS-SVMs training with a validation criterion on the other hand. (keywords: LS-SVMs, additive models, regularization, structure detection)Comment: 22 pages. Accepted for publication in Support Vector Machines: Theory and Applications, ed. L. Wang, 200

arXiv.org e-Print Archive

Managing Randomization in the Multi-Block Alternating Direction Method of Multipliers for Quadratic Optimization

Author: Mihic Kresimir
Ye Yinyu
Zhu Mingxi
Publication venue
Publication date: 23/03/2020
Field of study

The Alternating Direction Method of Multipliers (ADMM) has gained a lot of attention for solving large-scale and objective-separable constrained optimization. However, the two-block variable structure of the ADMM still limits the practical computational efficiency of the method, because one big matrix factorization is needed at least once even for linear and convex quadratic programming. This drawback may be overcome by enforcing a multi-block structure of the decision variables in the original optimization problem. Unfortunately, the multi-block ADMM, with more than two blocks, is not guaranteed to be convergent. On the other hand, two positive developments have been made: first, if in each cyclic loop one randomly permutes the updating order of the multiple blocks, then the method converges in expectation for solving any system of linear equations with any number of blocks. Secondly, such a randomly permuted ADMM also works for equality-constrained convex quadratic programming even when the objective function is not separable. The goal of this paper is twofold. First, we add more randomness into the ADMM by developing a randomly assembled cyclic ADMM (RAC-ADMM) where the decision variables in each block are randomly assembled. We discuss the theoretical properties of RAC-ADMM and show when random assembling helps and when it hurts, and develop a criterion to guarantee that it converges almost surely. Secondly, using the theoretical guidance on RAC-ADMM, we conduct multiple numerical tests on solving both randomly generated and large-scale benchmark quadratic optimization problems, which include continuous, and binary graph-partition and quadratic assignment, and selected machine learning problems. Our numerical tests show that the RAC-ADMM, with a variable-grouping strategy, could significantly improve the computation efficiency on solving most quadratic optimization problems.Comment: Expanded and streamlined theoretical sections. Added comparisons with other multi-block ADMM variants. Updated Computational Studies Section on continuous problems -- reporting primal and dual residuals instead of objective value gap. Added selected machine learning problems (ElasticNet/Lasso and Support Vector Machine) to Computational Studies Sectio

arXiv.org e-Print Archive

Convex Optimization for Binary Classifier Aggregation in Multiclass Problems

Author: Choi Seungjin
Hwang TaeHyun
Park Sunho
Publication venue
Publication date: 16/01/2014
Field of study

Multiclass problems are often decomposed into multiple binary problems that are solved by individual binary classifiers whose results are integrated into a final answer. Various methods, including all-pairs (APs), one-versus-all (OVA), and error correcting output code (ECOC), have been studied, to decompose multiclass problems into binary problems. However, little study has been made to optimally aggregate binary problems to determine a final answer to the multiclass problem. In this paper we present a convex optimization method for an optimal aggregation of binary classifiers to estimate class membership probabilities in multiclass problems. We model the class membership probability as a softmax function which takes a conic combination of discrepancies induced by individual binary classifiers, as an input. With this model, we formulate the regularized maximum likelihood estimation as a convex optimization problem, which is solved by the primal-dual interior point method. Connections of our method to large margin classifiers are presented, showing that the large margin formulation can be considered as a limiting case of our convex formulation. Numerical experiments on synthetic and real-world data sets demonstrate that our method outperforms existing aggregation methods as well as direct methods, in terms of the classification accuracy and the quality of class membership probability estimates.Comment: Appeared in Proceedings of the 2014 SIAM International Conference on Data Mining (SDM 2014

arXiv.org e-Print Archive

Primal-Dual Rates and Certificates

Author: Dünner Celestine
Forte Simone
Jaggi Martin
Takáč Martin
Publication venue
Publication date: 02/06/2016
Field of study

We propose an algorithm-independent framework to equip existing optimization methods with primal-dual certificates. Such certificates and corresponding rate of convergence guarantees are important for practitioners to diagnose progress, in particular in machine learning applications. We obtain new primal-dual convergence rates, e.g., for the Lasso as well as many L1, Elastic Net, group Lasso and TV-regularized problems. The theory applies to any norm-regularized generalized linear model. Our approach provides efficiently computable duality gaps which are globally defined, without modifying the original problems in the region of interest.Comment: appearing at ICML 2016 - Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 4

arXiv.org e-Print Archive