465 research outputs found

    SLIMSVM : a simple implementation of support vector machine for analysis of microarray data

    Get PDF
    Support Vector Machine (SVM) is a supervised machine learning technique being widely used in multiple areas of biological analysis including microarray data analysis. SlimSVM has been developed with the intention of replacing OSU SVM as the classification component of GenoIterSVM in order to make it independent of other SVM packages. GenolterSVM, developed by Dr. Marc Ma, is a SVM implementation with an iterative refinement algorithm for improved accuracy of classification of genotype microarray data. SlimSVM is an object-oriented, modular, and easy-to-use implementation written in C++. It supports dot (linear) and polynomial (non-linear) kernels. The program has been tested with artificial non-biological and microarray data. Testing with microarray data was performed to observe how SlimSVM handles medium-sized data files (containing thousands of data points) since it would ultimately be used to analyze them. The results were compared to those of LIBSVM, a leading SVM software, and the comparison demonstrates that implementation of SlimS VM was carried out accurately

    Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.</p> <p>We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.</p> <p>Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution.</p> <p>Results</p> <p>Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (<it>L</it><sub>1</sub>) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.</p> <p>Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations.</p> <p>Conclusions</p> <p>The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.</p> <p>The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.</p> <p>We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.</p

    Modification of Support Vector Machine for Microarray Data Analysis

    Get PDF
    The role of protuberant data analysis in selection of certain genes having distinctive level of activities between conditions of interest i.e diseased gene and normal genes is very significant. Nowa- days it is become a standard in gene analysis that microarray of DNA is a crucial data preparation step in systemization and other biological analysis. We consider the problem of constructing an accurate prediction rule for separating the different labels of genes in microarray gene expression data. Use of SVM in such data analysis is not new but it is not up to the mark we desire. So in this manuscript, we have tried to modify Support Vector Machine (SVM) for better accuracy in cancer genes systemization. Here we have modified SVM to account for gene redundancy and keep a check on it. In the other approach, instead of keeping bias a constant in SVM, we have tried to modify SVM by bias variation which we call as Orthogonal Vertical Permutator (OVP)

    Sparse Subspace Clustering: Algorithm, Theory, and Applications

    Full text link
    In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories the data belongs to. In this paper, we propose and study an algorithm, called Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of subspaces and the distribution of data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm can be solved efficiently and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering
    • …
    corecore