907 research outputs found

    Sparse kernel SVMs via cutting-plane training

    Get PDF
    Appeal from a decision of the Board of Review, Industrial Commission of Utah on February 7, 1989

    Efficient Multi-Template Learning for Structured Prediction

    Full text link
    Conditional random field (CRF) and Structural Support Vector Machine (Structural SVM) are two state-of-the-art methods for structured prediction which captures the interdependencies among output variables. The success of these methods is attributed to the fact that their discriminative models are able to account for overlapping features on the whole input observations. These features are usually generated by applying a given set of templates on labeled data, but improper templates may lead to degraded performance. To alleviate this issue, in this paper, we propose a novel multiple template learning paradigm to learn structured prediction and the importance of each template simultaneously, so that hundreds of arbitrary templates could be added into the learning model without caution. This paradigm can be formulated as a special multiple kernel learning problem with exponential number of constraints. Then we introduce an efficient cutting plane algorithm to solve this problem in the primal, and its convergence is presented. We also evaluate the proposed learning paradigm on two widely-studied structured prediction tasks, \emph{i.e.} sequence labeling and dependency parsing. Extensive experimental results show that the proposed method outperforms CRFs and Structural SVMs due to exploiting the importance of each template. Our complexity analysis and empirical results also show that our proposed method is more efficient than OnlineMKL on very sparse and high-dimensional data. We further extend this paradigm for structured prediction using generalized pp-block norm regularization with p>1p>1, and experiments show competitive performances when p[1,2)p \in [1,2)

    A Feature Selection Method for Multivariate Performance Measures

    Full text link
    Feature selection with specific multivariate performance measures is the key to the success of many applications, such as image retrieval and text classification. The existing feature selection methods are usually designed for classification error. In this paper, we propose a generalized sparse regularizer. Based on the proposed regularizer, we present a unified feature selection framework for general loss functions. In particular, we study the novel feature selection paradigm by optimizing multivariate performance measures. The resultant formulation is a challenging problem for high-dimensional data. Hence, a two-layer cutting plane algorithm is proposed to solve this problem, and the convergence is presented. In addition, we adapt the proposed method to optimize multivariate measures for multiple instance learning problems. The analyses by comparing with the state-of-the-art feature selection methods show that the proposed method is superior to others. Extensive experiments on large-scale and high-dimensional real world datasets show that the proposed method outperforms l1l_1-SVM and SVM-RFE when choosing a small subset of features, and achieves significantly improved performances over SVMperf^{perf} in terms of F1F_1-score

    Block-Coordinate Frank-Wolfe Optimization for Structural SVMs

    Full text link
    We propose a randomized block-coordinate variant of the classic Frank-Wolfe algorithm for convex optimization with block-separable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full Frank-Wolfe algorithm. We also show that, when applied to the dual structural support vector machine (SVM) objective, this yields an online algorithm that has the same low iteration complexity as primal stochastic subgradient methods. However, unlike stochastic subgradient methods, the block-coordinate Frank-Wolfe algorithm allows us to compute the optimal step-size and yields a computable duality gap guarantee. Our experiments indicate that this simple algorithm outperforms competing structural SVM solvers.Comment: Appears in Proceedings of the 30th International Conference on Machine Learning (ICML 2013). 9 pages main text + 22 pages appendix. Changes from v3 to v4: 1) Re-organized appendix; improved & clarified duality gap proofs; re-drew all plots; 2) Changed convention for Cf definition; 3) Added weighted averaging experiments + convergence results; 4) Clarified main text and relationship with appendi

    Structured Learning of Tree Potentials in CRF for Image Segmentation

    Full text link
    We propose a new approach to image segmentation, which exploits the advantages of both conditional random fields (CRFs) and decision trees. In the literature, the potential functions of CRFs are mostly defined as a linear combination of some pre-defined parametric models, and then methods like structured support vector machines (SSVMs) are applied to learn those linear coefficients. We instead formulate the unary and pairwise potentials as nonparametric forests---ensembles of decision trees, and learn the ensemble parameters and the trees in a unified optimization problem within the large-margin framework. In this fashion, we easily achieve nonlinear learning of potential functions on both unary and pairwise terms in CRFs. Moreover, we learn class-wise decision trees for each object that appears in the image. Due to the rich structure and flexibility of decision trees, our approach is powerful in modelling complex data likelihoods and label relationships. The resulting optimization problem is very challenging because it can have exponentially many variables and constraints. We show that this challenging optimization can be efficiently solved by combining a modified column generation and cutting-planes techniques. Experimental results on both binary (Graz-02, Weizmann horse, Oxford flower) and multi-class (MSRC-21, PASCAL VOC 2012) segmentation datasets demonstrate the power of the learned nonlinear nonparametric potentials.Comment: 10 pages. Appearing in IEEE Transactions on Neural Networks and Learning System

    Training linear ranking SVMs in linearithmic time using red-black trees

    Full text link
    We introduce an efficient method for training the linear ranking support vector machine. The method combines cutting plane optimization with red-black tree based approach to subgradient calculations, and has O(m*s+m*log(m)) time complexity, where m is the number of training examples, and s the average number of non-zero features per example. Best previously known training algorithms achieve the same efficiency only for restricted special cases, whereas the proposed approach allows any real valued utility scores in the training data. Experiments demonstrate the superior scalability of the proposed approach, when compared to the fastest existing RankSVM implementations.Comment: 20 pages, 4 figure
    corecore