Search CORE

29,742 research outputs found

A nested heuristic for parameter tuning in Support Vector Machines

Author: Barbero Jiménez
Belén Martín-Barragán
Brimberg
Brimberg
Brooks
Carrizosa
Carrizosa
Chapelle
Cohen
Collobert
Cristianini
Dolores Romero Morales
Drazić
Drazić
Duan
Emilio Carrizosa
Friedrichs
Gold
Grosso
Gönen
Hansen
Hansen
Herbrich
Hofmann
Joachims
Keerthi
Lanckriet
Lavor
Locatelli
Locatelli
Lorena
Luss
Martín de Diego
Mladenović
Mladenović
Mladenović
Rätsch
Sonnenburg
Vapnik
Vapnik
Vapnik
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/03/2014
Field of study

The default approach for tuning the parameters of a Support Vector Machine (SVM) is a grid search in the parameter space. Different metaheuristics have been recently proposed as a more efficient alternative, but they have only shown to be useful in models with a low number of parameters. Complex models, involving many parameters, can be seen as extensions of simpler and easy-to-tune models, yielding a nested sequence of models of increasing complexity. In this paper we propose an algorithm which successfully exploits this nested property, with two main advantages versus the state of the art. First, our framework is general enough to allow one to address, with the very same method, several popular SVM parameter models encountered in the literature. Second, as algorithmic requirements we only need either an SVM library or any routine for the minimization of convex quadratic functions under linear constraints. In the computational study, we address Multiple Kernel Learning tuning problems for which grid search clearly would be infeasible, while our classification accuracy is comparable to that of ad-hoc modeldependent benchmark tuning methods.Ministerio de Ciencia e InnovaciónJunta de AndalucíaEuropean Development Fund

Crossref

Edinburgh Research Explorer

idUS. Depósito de Investigación Universidad de Sevilla

Semi-Supervised and Unsupervised Novelty Detection using Nested Support Vector Machines

Author: Borgeaud Maurice
de Morsier Frank
Gass Volker
Küchler Christoph
Thiran Jean-Philippe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/03/2012
Field of study

Very often in change detection only few labels or even none are available. In order to perform change detection in these extreme scenarios, they can be considered as novelty detection problems, semi-supervised (SSND) if some labels are available otherwise unsupervised (UND). SSND can be seen as an unbalanced classification between labeled and unlabeled samples using the Cost-Sensitive Support Vector Machine (CS-SVM). UND assumes novelties in low density regions and can be approached using the One-Class SVM (OC-SVM). We propose here to use nested entire solution path algorithms for the OC-SVM and CS-SVM in order to accelerate the parameter selection and alleviate the dependency to labeled ``changed'' samples. Experiments are performed on two multitemporal change detection datasets (flood and fire detection) and the performance of the two methods proposed compared

Infoscience - École polytechnique fédérale de Lausanne

Crossref

A study of hierarchical and flat classification of proteins

Author: Buchwald Fabian
Frank Eibe
Kramer Stefan
Zimek Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area

Research Commons@Waikato

Geometric Approach to Support Vector Machines Learning for Large Datasets

Author: Strack Robert
Publication venue: VCU Scholars Compass
Publication date: 03/05/2013
Field of study

The dissertation introduces Sphere Support Vector Machines (SphereSVM) and Minimal Norm Support Vector Machines (MNSVM) as the new fast classification algorithms that use geometrical properties of the underlying classification problems to efficiently obtain models describing training data. SphereSVM is based on combining minimal enclosing ball approach, state of the art nearest point problem solvers and probabilistic techniques. The blending of the three speeds up the training phase of SVMs significantly and reaches similar (i.e., practically the same) accuracy as the other classification models over several big and large real data sets within the strict validation frame of a double (nested) cross-validation (CV). MNSVM is further simplification of SphereSVM algorithm. Here, relatively complex classification task was converted into one of the simplest geometrical problems -- minimal norm problem. This resulted in additional speedup compared to SphereSVM. The results shown are promoting both SphereSVM and MNSVM as outstanding alternatives for handling large and ultra-large datasets in a reasonable time without switching to various parallelization schemes for SVMs algorithms proposed recently. The variants of both algorithms, which work without explicit bias term, are also presented. In addition, other techniques aiming to improve the time efficiency are discussed (such as over-relaxation and improved support vector selection scheme). Finally, the accuracy and performance of all these modifications are carefully analyzed and results based on nested cross-validation procedure are shown

VCU Scholars Compass

Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

Author: Marko Nicholas
Razzaghi Talayeh
Roderick Oleg
Safro Ilya
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/04/2016
Field of study

This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare