Search CORE

465 research outputs found

SLIMSVM : a simple implementation of support vector machine for analysis of microarray data

Author: Karmaker Avik
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2004
Field of study

Support Vector Machine (SVM) is a supervised machine learning technique being widely used in multiple areas of biological analysis including microarray data analysis. SlimSVM has been developed with the intention of replacing OSU SVM as the classification component of GenoIterSVM in order to make it independent of other SVM packages. GenolterSVM, developed by Dr. Marc Ma, is a SVM implementation with an iterative refinement algorithm for improved accuracy of classification of genotype microarray data. SlimSVM is an object-oriented, modular, and easy-to-use implementation written in C++. It supports dot (linear) and polynomial (non-linear) kernels. The program has been tested with artificial non-biological and microarray data. Testing with microarray data was performed to observe how SlimSVM handles medium-sized data files (containing thousands of data points) since it would ultimately be used to analyze them. The results were compared to those of LIBSVM, a leading SVM software, and the comparison demonstrates that implementation of SlimS VM was carried out accurately

Digital Commons @ New Jersey Institute of Technology (NJIT)

Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data

Author: Axel Benner
C Chang
D Jones
DB Allison
E Dimitriadou
F Markowetz
G Fung
Grischa Toedt
H Froehlich
H Zou
HH Zhang
I Guyon
I Guyon
I Inza
J Fan
J Quackenbush
JC Hsu
JD Hoheisel
JD Storey
L Wang
L Wang
LJ van't Veer
M Greiner
M Johannes
MJ van de Vijver
N Becker
Natalia Becker
Peter Lichter
PS Bradley
Q Liu
R Kohavi
R Kohavi
R Tibshirani
T Hastie
V Vapnik
W Gu
X Li
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (<it>L</it>1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters. The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'. We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Learning Multiclass Rules with Class-Selective Rejection and Performance Constraints

Author: Edith Grall-Maes
Nisrine Jrad
Pierre Beauseroy
Publication venue: 'IntechOpen'
Publication date: 01/02/2010
Field of study

International audienc

IntechOpen

HAL Descartes

Hal-Diderot

Modification of Support Vector Machine for Microarray Data Analysis

Author: Sk Sarif Hassan
Sk Sarif Hassan
Publication venue: Global Journals Inc. (US)
Publication date: 01/08/2013
Field of study

The role of protuberant data analysis in selection of certain genes having distinctive level of activities between conditions of interest i.e diseased gene and normal genes is very significant. Nowa- days it is become a standard in gene analysis that microarray of DNA is a crucial data preparation step in systemization and other biological analysis. We consider the problem of constructing an accurate prediction rule for separating the different labels of genes in microarray gene expression data. Use of SVM in such data analysis is not new but it is not up to the mark we desire. So in this manuscript, we have tried to modify Support Vector Machine (SVM) for better accuracy in cancer genes systemization. Here we have modified SVM to account for gene redundancy and keep a check on it. In the other approach, instead of keeping bias a constant in SVM, we have tried to modify SVM by bias variation which we call as Orthogonal Vertical Permutator (OVP)

Global Journal of Computer Science and Technology (GJCST)

Sparse Subspace Clustering: Algorithm, Theory, and Applications

Author: Elhamifar Ehsan
Vidal Rene
Publication venue
Publication date: 01/01/2013
Field of study

In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories the data belongs to. In this paper, we propose and study an algorithm, called Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of subspaces and the distribution of data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm can be solved efficiently and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering

arXiv.org e-Print Archive

CiteSeerX