2,764 research outputs found

    Training Support Vector Machines Using Frank-Wolfe Optimization Methods

    Full text link
    Training a Support Vector Machine (SVM) requires the solution of a quadratic programming problem (QP) whose computational complexity becomes prohibitively expensive for large scale datasets. Traditional optimization methods cannot be directly applied in these cases, mainly due to memory restrictions. By adopting a slightly different objective function and under mild conditions on the kernel used within the model, efficient algorithms to train SVMs have been devised under the name of Core Vector Machines (CVMs). This framework exploits the equivalence of the resulting learning problem with the task of building a Minimal Enclosing Ball (MEB) problem in a feature space, where data is implicitly embedded by a kernel function. In this paper, we improve on the CVM approach by proposing two novel methods to build SVMs based on the Frank-Wolfe algorithm, recently revisited as a fast method to approximate the solution of a MEB problem. In contrast to CVMs, our algorithms do not require to compute the solutions of a sequence of increasingly complex QPs and are defined by using only analytic optimization steps. Experiments on a large collection of datasets show that our methods scale better than CVMs in most cases, sometimes at the price of a slightly lower accuracy. As CVMs, the proposed methods can be easily extended to machine learning problems other than binary classification. However, effective classifiers are also obtained using kernels which do not satisfy the condition required by CVMs and can thus be used for a wider set of problems

    Modified Frank-Wolfe Algorithm for Enhanced Sparsity in Support Vector Machine Classifiers

    Full text link
    This work proposes a new algorithm for training a re-weighted L2 Support Vector Machine (SVM), inspired on the re-weighted Lasso algorithm of Cand\`es et al. and on the equivalence between Lasso and SVM shown recently by Jaggi. In particular, the margin required for each training vector is set independently, defining a new weighted SVM model. These weights are selected to be binary, and they are automatically adapted during the training of the model, resulting in a variation of the Frank-Wolfe optimization algorithm with essentially the same computational complexity as the original algorithm. As shown experimentally, this algorithm is computationally cheaper to apply since it requires less iterations to converge, and it produces models with a sparser representation in terms of support vectors and which are more stable with respect to the selection of the regularization hyper-parameter

    A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

    Full text link
    Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error ϵ\epsilon and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an ϵ\epsilon-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.Comment: Extended version of the SIAM Data Mining 2015 pape

    A Multi-Plane Block-Coordinate Frank-Wolfe Algorithm for Training Structural SVMs with a Costly max-Oracle

    Full text link
    Structural support vector machines (SSVMs) are amongst the best performing models for structured computer vision tasks, such as semantic image segmentation or human pose estimation. Training SSVMs, however, is computationally costly, because it requires repeated calls to a structured prediction subroutine (called \emph{max-oracle}), which has to solve an optimization problem itself, e.g. a graph cut. In this work, we introduce a new algorithm for SSVM training that is more efficient than earlier techniques when the max-oracle is computationally expensive, as it is frequently the case in computer vision tasks. The main idea is to (i) combine the recent stochastic Block-Coordinate Frank-Wolfe algorithm with efficient hyperplane caching, and (ii) use an automatic selection rule for deciding whether to call the exact max-oracle or to rely on an approximate one based on the cached hyperplanes. We show experimentally that this strategy leads to faster convergence to the optimum with respect to the number of requires oracle calls, and that this translates into faster convergence with respect to the total runtime when the max-oracle is slow compared to the other steps of the algorithm. A publicly available C++ implementation is provided at http://pub.ist.ac.at/~vnk/papers/SVM.html

    A PARTAN-Accelerated Frank-Wolfe Algorithm for Large-Scale SVM Classification

    Full text link
    Frank-Wolfe algorithms have recently regained the attention of the Machine Learning community. Their solid theoretical properties and sparsity guarantees make them a suitable choice for a wide range of problems in this field. In addition, several variants of the basic procedure exist that improve its theoretical properties and practical performance. In this paper, we investigate the application of some of these techniques to Machine Learning, focusing in particular on a Parallel Tangent (PARTAN) variant of the FW algorithm that has not been previously suggested or studied for this type of problems. We provide experiments both in a standard setting and using a stochastic speed-up technique, showing that the considered algorithms obtain promising results on several medium and large-scale benchmark datasets for SVM classification

    Optimization Methods for Semi-Supervised Learning

    Get PDF
    The goal of this thesis is to provide efficient optimization algorithms for some semi-supervised learning (SSL) tasks in machine learning. For many machine learning tasks, training a classifier requires a large amount of labeled data; however, providing labels typically requires costly manual annotation. Fortunately, there is typically an abundance of unlabeled data that can be easily collected for many domains. In this thesis, we focus on problems where an underlying structure allows us to leverage the large amounts of unlabeled data, while only requiring small amounts of labeled data. In particular, we consider low-rank matrix completion problems with applications to recommender systems, and semi-supervised support vector machines (S3VM) to solve binary classification problems, such as digit recognition or disease classification. For the first class of problems, we study convex approximations to the low-rank matrix completion problem. Instead of restricting the solution space to low-rank matrices, we use the trace norm as a convex surrogate. Unfortunately, many trace norm minimization algorithms scale very poorly in practice since they require a full singular value decomposition (SVD) at each iteration. Recently, there has been renewed interest in the trace norm constrained problem utilizing the Frank-Wolfe algorithm, which only requires calculating the leading singular vector pair, providing an order of magnitude improvement on the iteration complexity. However, the Frank-Wolfe algorithm empirically has very slow convergence and in practice yields high-rank solutions, which greatly increases computational costs. To address this issue, we investigate a rank-drop step for Frank-Wolfe, which solves a subproblem specifically designed to decrease the rank of the iterate, ensuring that the Frank-Wolfe algorithm converges along a low-rank path. We show that this rank-drop subproblem can be decomposed into two cases, where each subproblem can be solved efficiently and we guarantee that the iterates remain feasible, preserving the projection-free property of Frank-Wolfe. Next we show that these ideas can be used to provide scalable algorithms for simultaneously sparse and low-rank matrix completion problems. We extend the Frank-Wolfe analysis to accommodate nonsmooth objectives, which can be used to solve the simultaneously sparse and low-rank problem. We replace the traditional linear approximation used in Frank-Wolfe by a uniform affine approximation to better address poor local approximations given by the first-order Taylor approximation. We show that this naturally leads to a sequence of smooth functions that uniformly converges to the original nonsmooth objective, allowing for a careful balance between approximation quality and convergence that is closely related to the step sizes of the Frank-Wolfe algorithm. We apply this algorithm to solve sparse covariance estimation problems, graph link prediction, and robust matrix completion problems. Finally, we propose a variant of self-training for the semi-supervised binary classification problem by leveraging ideas from S3VM. To address common issues associated with self-training, such as error propagation and label imbalances, we proposed an adaptive scheme using the functional margin of S3VM to construct a confidence measure. The confidence score is used to create rules to adapt the optimization problems to incorporate label uncertainty and class imbalances. Moreover, we show that the incremental training approach leverages warm-starts very well, leading to much faster training than standard S3VM methods alone, with much stronger empirical performance on imbalanced datasets
    corecore