428 research outputs found
Regularized Least Squares Cancer Classifiers from DNA microarray data
BACKGROUND: The advent of the technology of DNA microarrays constitutes an epochal change in the classification and discovery of different types of cancer because the information provided by DNA microarrays allows an approach to the problem of cancer analysis from a quantitative rather than qualitative point of view. Cancer classification requires well founded mathematical methods which are able to predict the status of new specimens with high significance levels starting from a limited number of data. In this paper we assess the performances of Regularized Least Squares (RLS) classifiers, originally proposed in regularization theory, by comparing them with Support Vector Machines (SVM), the state-of-the-art supervised learning technique for cancer classification by DNA microarray data. The performances of both approaches have been also investigated with respect to the number of selected genes and different gene selection strategies. RESULTS: We show that RLS classifiers have performances comparable to those of SVM classifiers as the Leave-One-Out (LOO) error evaluated on three different data sets shows. The main advantage of RLS machines is that for solving a classification problem they use a linear system of order equal to either the number of features or the number of training examples. Moreover, RLS machines allow to get an exact measure of the LOO error with just one training. CONCLUSION: RLS classifiers are a valuable alternative to SVM classifiers for the problem of cancer classification by gene expression data, due to their simplicity and low computational complexity. Moreover, RLS classifiers show generalization ability comparable to the ones of SVM classifiers also in the case the classification of new specimens involves very few gene expression levels
Comparing Prediction Accuracy for Supervised Techniques in Gene Expression Data
Classification is one of the most important tasks for different application such as text categorization, tone recognition, image classification, micro-array gene expression, proteins structure predictions, data classification etc. Microarray based gene expression profiling has been emerged as an efficient technique for cancer classification, as well as for diagnosis, prognosis, and treatment purposes. The classification of different tumor types is of great significance in cancer diagnosis and drug innovation. One challenging area in the studies of gene expression data is the classification of different types of tumors into correct classes. Diagonal discriminant analysis, regularized discriminant analysis, support vector machines and k-nearest neighbor have been suggested as among the best methods for small sample size situations. The methods are applied to datasets from four recently published cancer gene expression studies. Four publicly available microarray data sets are Leukemia, Lymphoma, SRBCT & Prostate. The performance of the classification technique has been evaluated according to the percentage of misclassification through hold-out cross validation
A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data
Gene expression analysis aims at identifying the genes able to accurately
predict biological parameters like, for example, disease subtyping or
progression. While accurate prediction can be achieved by means of many
different techniques, gene identification, due to gene correlation and the
limited number of available samples, is a much more elusive problem. Small
changes in the expression values often produce different gene lists, and
solutions which are both sparse and stable are difficult to obtain. We propose
a two-stage regularization method able to learn linear models characterized by
a high prediction performance. By varying a suitable parameter these linear
models allow to trade sparsity for the inclusion of correlated genes and to
produce gene lists which are almost perfectly nested. Experimental results on
synthetic and microarray data confirm the interesting properties of the
proposed method and its potential as a starting point for further biological
investigationsComment: 17 pages, 8 Post-script figure
Comparing Prediction Accuracy for Machine Learning and Other Classical Approaches in Gene Expression Data
Microarray based gene expression profiling has been emerged as an efficient technique for cancer classification, as well as for diagnosis, prognosis, and treatment purposes. The classification of different tumor types is of great significance in cancer diagnosis and drug innovation. Using a large number of genes to classify samples based on a small number of microarrays remains a difficult problem. Feature selection techniques can be used to extract the marker genes which influence the classification accuracy effectively by eliminating the unwanted noisy and redundant genes. Quite a number of methods have been proposed in recent years with promising results. But there are still a lot of issues which need to be addressed and understood. Diagonal discriminant analysis, regularized discriminant analysis, support vector machines and k-nearest neighbor have been suggested as among the best methods for small sample size situations. In this paper, we have compared the performance of different discrimination methods for the classification of tumors based on gene expression data. The methods are applied to datasets from four recently published cancer gene expression studies. The performance of the classification technique has been evaluated for varying number of selected features in terms of misclassification rate using hold-out cross validation. Our study shows that KNN, RDA and SVM with linear kernel methods have lower misclassification rate than the other algorithms. Keywords: microarray, gene expression, KNN, DLDA, RDA, SV
A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory
Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm
Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition
Sparse representation based classification (SRC) methods have achieved
remarkable results. SRC, however, still suffer from requiring enough training
samples, insufficient use of test samples and instability of representation. In
this paper, a stable inverse projection representation based classification
(IPRC) is presented to tackle these problems by effectively using test samples.
An IPR is firstly proposed and its feasibility and stability are analyzed. A
classification criterion named category contribution rate is constructed to
match the IPR and complete classification. Moreover, a statistical measure is
introduced to quantify the stability of representation-based classification
methods. Based on the IPRC technique, a robust tumor recognition framework is
presented by interpreting microarray gene expression data, where a two-stage
hybrid gene selection method is introduced to select informative genes.
Finally, the functional analysis of candidate's pathogenicity-related genes is
given. Extensive experiments on six public tumor microarray gene expression
datasets demonstrate the proposed technique is competitive with
state-of-the-art methods.Comment: 14 pages, 19 figures, 10 table
A new regularized least squares support vector regression for gene selection
<p>Abstract</p> <p>Background</p> <p>Selection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively small group of subjects. In addition to the curse of dimensionality, many gene selection methods weight the contribution from each individual subject equally. This equal-contribution assumption cannot account for the possible dependence among subjects who associate similarly to the disease, and may restrict the selection of influential genes.</p> <p>Results</p> <p>A novel approach to gene selection is proposed based on kernel similarities and kernel weights. We do not assume uniformity for subject contribution. Weights are calculated via regularized least squares support vector regression (RLS-SVR) of class levels on kernel similarities and are used to weight subject contribution. The cumulative sum of weighted expression levels are next ranked to select responsible genes. These procedures also work for multiclass classification. We demonstrate this algorithm on acute leukemia, colon cancer, small, round blue cell tumors of childhood, breast cancer, and lung cancer studies, using kernel Fisher discriminant analysis and support vector machines as classifiers. Other procedures are compared as well.</p> <p>Conclusion</p> <p>This approach is easy to implement and fast in computation for both binary and multiclass problems. The gene set provided by the RLS-SVR weight-based approach contains a less number of genes, and achieves a higher accuracy than other procedures.</p
- …