Search CORE

105 research outputs found

Block-Coordinate Frank-Wolfe Optimization for Structural SVMs

Author: Jaggi Martin
Lacoste-Julien Simon
Pletscher Patrick
Schmidt Mark
Publication venue
Publication date: 01/01/2013
Field of study

We propose a randomized block-coordinate variant of the classic Frank-Wolfe algorithm for convex optimization with block-separable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full Frank-Wolfe algorithm. We also show that, when applied to the dual structural support vector machine (SVM) objective, this yields an online algorithm that has the same low iteration complexity as primal stochastic subgradient methods. However, unlike stochastic subgradient methods, the block-coordinate Frank-Wolfe algorithm allows us to compute the optimal step-size and yields a computable duality gap guarantee. Our experiments indicate that this simple algorithm outperforms competing structural SVM solvers.Comment: Appears in Proceedings of the 30th International Conference on Machine Learning (ICML 2013). 9 pages main text + 22 pages appendix. Changes from v3 to v4: 1) Re-organized appendix; improved & clarified duality gap proofs; re-drew all plots; 2) Changed convention for Cf definition; 3) Added weighted averaging experiments + convergence results; 4) Clarified main text and relationship with appendi

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Polytechnique

A Convex Surrogate Operator for General Non-Modular Loss Functions

Author: Blaschko Matthew
Yu Jiaqian
Publication venue: HAL CCSD
Publication date: 09/05/2016
Field of study

International audienceEmpirical risk minimization frequently employs convex surrogates to underlying discrete loss functions in order to achieve computational tractability during optimization. However, classical convex surrogates can only tightly bound modular loss functions, sub-modular functions or supermodular functions separately while maintaining polynomial time computation. In this work, a novel generic convex surrogate for general non-modular loss functions is introduced, which provides for the first time a tractable solution for loss functions that are neither super-modular nor submodular. This convex surro-gate is based on a submodular-supermodular decomposition for which the existence and uniqueness is proven in this paper. It takes the sum of two convex surrogates that separately bound the supermodular component and the submodular component using slack-rescaling and the Lovász hinge, respectively. It is further proven that this surrogate is convex , piecewise linear, an extension of the loss function, and for which subgradient computation is polynomial time. Empirical results are reported on a non-submodular loss based on the Sørensen-Dice difference function, and a real-world face track dataset with tens of thousands of frames, demonstrating the improved performance, efficiency, and scalabil-ity of the novel convex surrogate

INRIA a CCSD electronic archive server

Optimizing different loss functions in multilabel classifications

Author: Bahamonde Rionda Antonio
Coz Velasco Juan José del
Díez Peláez Jorge
Luaces Rodríguez Óscar
Publication venue: Springer
Publication date
Field of study

Multilabel classification (ML) aims to assign a set of labels to an instance. This generalization of multiclass classification yields to the redefinition of loss functions and the learning tasks become harder. The objective of this paper is to gain insights into the relations of optimization aims and some of the most popular performance measures: subset (or 0/1), Hamming, and the example-based F-measure. To make a fair comparison, we implemented three ML learners for optimizing explicitly each one of these measures in a common framework. This can be done considering a subset of labels as a structured output. Then, we use structured output support vector machines tailored to optimize a given loss function. The paper includes an exhaustive experimental comparison. The conclusion is that in most cases, the optimization of the Hamming loss produces the best or competitive scores. This is a practical result since the Hamming loss can be minimized using a bunch of binary classifiers, one for each label separately, and therefore, it is a scalable and fast method to learn ML tasks. Additionally, we observe that in noise-free learning tasks optimizing the subset loss is the best option, but the differences are very small. We have also noticed that the biggest room for improvement can be found when the goal is to optimize an F-measure in noisy learning task

Repositorio Institucional de la Universidad de Oviedo

Scalable large margin pairwise learning algorithms

Author: Alnnfiai Majdi Khalid
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2019
Field of study

2019 Summer.Includes bibliographical references.Classification is a major task in machine learning and data mining applications. Many of these applications involve building a classification model using a large volume of imbalanced data. In such an imbalanced learning scenario, the area under the ROC curve (AUC) has proven to be a reliable performance measure to evaluate a classifier. Therefore, it is desirable to develop scalable learning algorithms that maximize the AUC metric directly. The kernelized AUC maximization machines have established a superior generalization ability compared to linear AUC machines. However, the computational cost of the kernelized machines hinders their scalability. To address this problem, we propose a large-scale nonlinear AUC maximization algorithm that learns a batch linear classifier on approximate feature space computed via the k-means Nyström method. The proposed algorithm is shown empirically to achieve comparable AUC classification performance or even better than the kernel AUC machines, while its training time is faster by several orders of magnitude. However, the computational complexity of the linear batch model compromises its scalability when training sizable datasets. Hence, we develop a second-order online AUC maximization algorithms based on a confidence-weighted model. The proposed algorithms exploit the second-order information to improve the convergence rate and implement a fixed-size buffer to address the multivariate nature of the AUC objective function. We also extend our online linear algorithms to consider an approximate feature map constructed using random Fourier features in an online setting. The results show that our proposed algorithms outperform or are at least comparable to the competing online AUC maximization methods. Despite their scalability, we notice that online first and second-order AUC maximization methods are prone to suboptimal convergence. This can be attributed to the limitation of the hypothesis space. A potential improvement can be attained by learning stochastic online variants. However, the vanilla stochastic methods also suffer from slow convergence because of the high variance introduced by the stochastic process. We address the problem of slow convergence by developing a fast convergence stochastic AUC maximization algorithm. The proposed stochastic algorithm is accelerated using a unique combination of scheduled regularization update and scheduled averaging. The experimental results show that the proposed algorithm performs better than the state-of-the-art online and stochastic AUC maximization methods in terms of AUC classification accuracy. Moreover, we develop a proximal variant of our accelerated stochastic AUC maximization algorithm. The proposed method applies the proximal operator to the hinge loss function. Therefore, it evaluates the gradient of the loss function at the approximated weight vector. Experiments on several benchmark datasets show that our proximal algorithm converges to the optimal solution faster than the previous AUC maximization algorithms

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Recommended from our members

The Design and Implementation of Low-Latency Prediction Serving Systems

Author: Crankshaw Daniel
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Machine learning is being deployed in a growing number of applications which demand real- time, accurate, and cost-efficient predictions under heavy query load. These applications employ a variety of machine learning frameworks and models, often composing several models within the same application. However, most machine learning frameworks and systems are optimized for model training and not deployment.In this thesis, I discuss three prediction serving systems designed to meet the needs of modern interactive machine learning applications. The key idea in this work is to utilize a decoupled, layered design that interposes systems on top of training frameworks to build low-latency, scalable serving systems. Velox introduced this decoupled architecture to enable fast online learning and model personalization in response to feedback. Clipper generalized this system architecture to be framework-agnostic and introduced a set of optimizations to reduce and bound prediction latency and improve prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. And InferLine provisions and manages the individual stages of prediction pipelines to minimize cost while meeting end-to-end tail latency constraints

eScholarship - University of California