1,721 research outputs found

    Laplacian Support Vector Machines Trained in the Primal

    Full text link
    In the last few years, due to the growing ubiquity of unlabeled data, much effort has been spent by the machine learning community to develop better understanding and improve the quality of classifiers exploiting unlabeled data. Following the manifold regularization approach, Laplacian Support Vector Machines (LapSVMs) have shown the state of the art performance in semi--supervised classification. In this paper we present two strategies to solve the primal LapSVM problem, in order to overcome some issues of the original dual formulation. Whereas training a LapSVM in the dual requires two steps, using the primal form allows us to collapse training to a single step. Moreover, the computational complexity of the training algorithm is reduced from O(n^3) to O(n^2) using preconditioned conjugate gradient, where n is the combined number of labeled and unlabeled examples. We speed up training by using an early stopping strategy based on the prediction on unlabeled data or, if available, on labeled validation examples. This allows the algorithm to quickly compute approximate solutions with roughly the same classification accuracy as the optimal ones, considerably reducing the training time. Due to its simplicity, training LapSVM in the primal can be the starting point for additional enhancements of the original LapSVM formulation, such as those for dealing with large datasets. We present an extensive experimental evaluation on real world data showing the benefits of the proposed approach.Comment: 39 pages, 14 figure

    Dual SVM Training on a Budget

    Full text link
    We present a dual subspace ascent algorithm for support vector machine training that respects a budget constraint limiting the number of support vectors. Budget methods are effective for reducing the training time of kernel SVM while retaining high accuracy. To date, budget training is available only for primal (SGD-based) solvers. Dual subspace ascent methods like sequential minimal optimization are attractive for their good adaptation to the problem structure, their fast convergence rate, and their practical speed. By incorporating a budget constraint into a dual algorithm, our method enjoys the best of both worlds. We demonstrate considerable speed-ups over primal budget training methods

    An Efficient Primal-Dual Prox Method for Non-Smooth Optimization

    Full text link
    We study the non-smooth optimization problems in machine learning, where both the loss function and the regularizer are non-smooth functions. Previous studies on efficient empirical loss minimization assume either a smooth loss function or a strongly convex regularizer, making them unsuitable for non-smooth optimization. We develop a simple yet efficient method for a family of non-smooth optimization problems where the dual form of the loss function is bilinear in primal and dual variables. We cast a non-smooth optimization problem into a minimax optimization problem, and develop a primal dual prox method that solves the minimax optimization problem at a rate of O(1/T)O(1/T) {assuming that the proximal step can be efficiently solved}, significantly faster than a standard subgradient descent method that has an O(1/T)O(1/\sqrt{T}) convergence rate. Our empirical study verifies the efficiency of the proposed method for various non-smooth optimization problems that arise ubiquitously in machine learning by comparing it to the state-of-the-art first order methods

    Towards Ultrahigh Dimensional Feature Selection for Big Data

    Full text link
    In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data. To solve this problem effectively, we first reformulate it as a convex semi-infinite programming (SIP) problem and then propose an efficient \emph{feature generating paradigm}. In contrast with traditional gradient-based approaches that conduct optimization on all input features, the proposed method iteratively activates a group of features and solves a sequence of multiple kernel learning (MKL) subproblems of much reduced scale. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such an optimization scheme, some efficient cache techniques are also developed. The feature generating paradigm can guarantee that the solution converges globally under mild conditions and achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world datasets containing tens of million data points with O(1014)O(10^{14}) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training efficiency.Comment: 61 page

    Classification of Diabetes Mellitus using Modified Particle Swarm Optimization and Least Squares Support Vector Machine

    Full text link
    Diabetes Mellitus is a major health problem all over the world. Many classification algorithms have been applied for its diagnoses and treatment. In this paper, a hybrid algorithm of Modified-Particle Swarm Optimization and Least Squares- Support Vector Machine is proposed for the classification of type II DM patients. LS-SVM algorithm is used for classification by finding optimal hyper-plane which separates various classes. Since LS-SVM is so sensitive to the changes of its parameter values, Modified-PSO algorithm is used as an optimization technique for LS-SVM parameters. This will Guarantee the robustness of the hybrid algorithm by searching for the optimal values for LS-SVM parameters. The pro-posed Algorithm is implemented and evaluated using Pima Indians Diabetes Data set from UCI repository of machine learning databases. It is also compared with different classifier algorithms which were applied on the same database. The experimental results showed the superiority of the proposed algorithm which could achieve an average classification accuracy of 97.833%

    Max-Margin Feature Selection

    Full text link
    Many machine learning applications such as in vision, biology and social networking deal with data in high dimensions. Feature selection is typically employed to select a subset of features which im- proves generalization accuracy as well as reduces the computational cost of learning the model. One of the criteria used for feature selection is to jointly minimize the redundancy and maximize the rele- vance of the selected features. In this paper, we formulate the task of feature selection as a one class SVM problem in a space where features correspond to the data points and instances correspond to the dimensions. The goal is to look for a representative subset of the features (support vectors) which describes the boundary for the region where the set of the features (data points) exists. This leads to a joint optimization of relevance and redundancy in a principled max-margin framework. Additionally, our formulation enables us to leverage existing techniques for optimizing the SVM objective resulting in highly computationally efficient solutions for the task of feature selection. Specifically, we employ the dual coordinate descent algorithm (Hsieh et al., 2008), originally proposed for SVMs, for our formulation. We use a sparse representation to deal with data in very high dimensions. Experiments on seven publicly available benchmark datasets from a variety of domains show that our approach results in orders of magnitude faster solutions even while retaining the same level of accuracy compared to the state of the art feature selection techniques.Comment: submitted to PR Letter

    Componentwise Least Squares Support Vector Machines

    Full text link
    This chapter describes componentwise Least Squares Support Vector Machines (LS-SVMs) for the estimation of additive models consisting of a sum of nonlinear components. The primal-dual derivations characterizing LS-SVMs for the estimation of the additive model result in a single set of linear equations with size growing in the number of data-points. The derivation is elaborated for the classification as well as the regression case. Furthermore, different techniques are proposed to discover structure in the data by looking for sparse components in the model based on dedicated regularization schemes on the one hand and fusion of the componentwise LS-SVMs training with a validation criterion on the other hand. (keywords: LS-SVMs, additive models, regularization, structure detection)Comment: 22 pages. Accepted for publication in Support Vector Machines: Theory and Applications, ed. L. Wang, 200

    Managing Randomization in the Multi-Block Alternating Direction Method of Multipliers for Quadratic Optimization

    Full text link
    The Alternating Direction Method of Multipliers (ADMM) has gained a lot of attention for solving large-scale and objective-separable constrained optimization. However, the two-block variable structure of the ADMM still limits the practical computational efficiency of the method, because one big matrix factorization is needed at least once even for linear and convex quadratic programming. This drawback may be overcome by enforcing a multi-block structure of the decision variables in the original optimization problem. Unfortunately, the multi-block ADMM, with more than two blocks, is not guaranteed to be convergent. On the other hand, two positive developments have been made: first, if in each cyclic loop one randomly permutes the updating order of the multiple blocks, then the method converges in expectation for solving any system of linear equations with any number of blocks. Secondly, such a randomly permuted ADMM also works for equality-constrained convex quadratic programming even when the objective function is not separable. The goal of this paper is twofold. First, we add more randomness into the ADMM by developing a randomly assembled cyclic ADMM (RAC-ADMM) where the decision variables in each block are randomly assembled. We discuss the theoretical properties of RAC-ADMM and show when random assembling helps and when it hurts, and develop a criterion to guarantee that it converges almost surely. Secondly, using the theoretical guidance on RAC-ADMM, we conduct multiple numerical tests on solving both randomly generated and large-scale benchmark quadratic optimization problems, which include continuous, and binary graph-partition and quadratic assignment, and selected machine learning problems. Our numerical tests show that the RAC-ADMM, with a variable-grouping strategy, could significantly improve the computation efficiency on solving most quadratic optimization problems.Comment: Expanded and streamlined theoretical sections. Added comparisons with other multi-block ADMM variants. Updated Computational Studies Section on continuous problems -- reporting primal and dual residuals instead of objective value gap. Added selected machine learning problems (ElasticNet/Lasso and Support Vector Machine) to Computational Studies Sectio

    Convex Optimization for Binary Classifier Aggregation in Multiclass Problems

    Full text link
    Multiclass problems are often decomposed into multiple binary problems that are solved by individual binary classifiers whose results are integrated into a final answer. Various methods, including all-pairs (APs), one-versus-all (OVA), and error correcting output code (ECOC), have been studied, to decompose multiclass problems into binary problems. However, little study has been made to optimally aggregate binary problems to determine a final answer to the multiclass problem. In this paper we present a convex optimization method for an optimal aggregation of binary classifiers to estimate class membership probabilities in multiclass problems. We model the class membership probability as a softmax function which takes a conic combination of discrepancies induced by individual binary classifiers, as an input. With this model, we formulate the regularized maximum likelihood estimation as a convex optimization problem, which is solved by the primal-dual interior point method. Connections of our method to large margin classifiers are presented, showing that the large margin formulation can be considered as a limiting case of our convex formulation. Numerical experiments on synthetic and real-world data sets demonstrate that our method outperforms existing aggregation methods as well as direct methods, in terms of the classification accuracy and the quality of class membership probability estimates.Comment: Appeared in Proceedings of the 2014 SIAM International Conference on Data Mining (SDM 2014

    Primal-Dual Rates and Certificates

    Get PDF
    We propose an algorithm-independent framework to equip existing optimization methods with primal-dual certificates. Such certificates and corresponding rate of convergence guarantees are important for practitioners to diagnose progress, in particular in machine learning applications. We obtain new primal-dual convergence rates, e.g., for the Lasso as well as many L1, Elastic Net, group Lasso and TV-regularized problems. The theory applies to any norm-regularized generalized linear model. Our approach provides efficiently computable duality gaps which are globally defined, without modifying the original problems in the region of interest.Comment: appearing at ICML 2016 - Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 4
    corecore