Search CORE

24,041 research outputs found

Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary Classifier

Author: Joseph Sandeep J.
Rekaya Romdhane
Robbins Kelly R.
Zhang Wensheng
Publication venue: Libertas Academica
Publication date: 01/03/2010
Field of study

Multi-class cancer classification based on microarray data is described. A generalized output-coding scheme based on One Versus One (OVO) combined with Latent Variable Model (LVM) is used. Results from the proposed One Versus One (OVO) outputcoding strategy is compared with the results obtained from the generalized One Versus All (OVA) method and their efficiencies of using them for multi-class tumor classification have been studied. This comparative study was done using two microarray gene expression data: Global Cancer Map (GCM) dataset and brain cancer (BC) dataset. Primary feature selection was based on fold change and penalized t-statistics. Evaluation was conducted with varying feature numbers. The OVO coding strategy worked quite well with the BC data, while both OVO and OVA results seemed to be similar for the GCM data. The selection of output coding methods for combining binary classifiers for multi-class tumor classification depends on the number of tumor types considered, the discrepancies between the tumor samples used for training as well as the heterogeneity of expression within the cancer subtypes used as training data

Directory of Open Access Journals

PubMed Central

Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

Author: Freyhult Eva
Hvidsten Torgeir R
Landfors Mattias
Rydén Patrik
Önskog Jenny
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning. Results In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods. Conclusion Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Publikationer från Uppsala Universitet

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

PLS dimension reduction for classification of microarray data

Author: Boulesteix Anne-Laure
Publication venue
Publication date: 01/01/2004
Field of study

PLS dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, PLS is compared with some of the best state-of-the-art classification methods. In addition, a simple procedure to choose the number of components is suggested. The connection between PLS dimension reduction and gene selection is examined and a property of the first PLS component for binary classification is proven. PLS can also be used as a visualization tool for high-dimensional data in the classification framework. The whole study is based on 9 real microarray cancer data sets

CiteSeerX

Open Access LMU

Elephant Search with Deep Learning for Microarray Data Analysis

Author: Panda Mrutyunjaya
Publication venue
Publication date: 12/07/2017
Field of study

Even though there is a plethora of research in Microarray gene expression data analysis, still, it poses challenges for researchers to effectively and efficiently analyze the large yet complex expression of genes. The feature (gene) selection method is of paramount importance for understanding the differences in biological and non-biological variation between samples. In order to address this problem, a novel elephant search (ES) based optimization is proposed to select best gene expressions from the large volume of microarray data. Further, a promising machine learning method is envisioned to leverage such high dimensional and complex microarray dataset for extracting hidden patterns inside to make a meaningful prediction and most accurate classification. In particular, stochastic gradient descent based Deep learning (DL) with softmax activation function is then used on the reduced features (genes) for better classification of different samples according to their gene expression levels. The experiments are carried out on nine most popular Cancer microarray gene selection datasets, obtained from UCI machine learning repository. The empirical results obtained by the proposed elephant search based deep learning (ESDL) approach are compared with most recent published article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl

arXiv.org e-Print Archive

Genomic and proteomic profiling for cancer diagnosis in dogs

Author: Morris Joanna S.
Publication venue: 'Elsevier BV'
Publication date: 01/09/2016
Field of study

Global gene expression, whereby tumours are classified according to similar gene expression patterns or ‘signatures’ regardless of cell morphology or tissue characteristics, is being increasingly used in both the human and veterinary fields to assist in cancer diagnosis and prognosis. Many studies on canine tumours have focussed on RNA expression using techniques such as microarrays or next generation sequencing. However, proteomic studies combining two-dimensional polyacrylamide gel electrophoresis or two-dimensional differential gel electrophoresis with mass spectrometry have also provided a wealth of data on gene expression in tumour tissues. In addition, proteomics has been instrumental in the search for tumour biomarkers in blood and other body fluids

Enlighten

Recommended from our members

The robust selection of predictive genes via a simple classifier

Author: Kellum P
Liu X
Tucker A
Vinciotti V
Publication venue: Adis International
Publication date: 01/01/2006
Field of study

Identifying genes that direct the mechanism of a disease from expression data is extremely useful in understanding how that mechanism works. This in turn may lead to better diagnoses and potentially can lead to a cure for that disease. This task becomes extremely challenging when the data are characterised by only a small number of samples and a high number of dimensions, as it is often the case with gene expression data. Motivated by this challenge, we present a general framework that focuses on simplicity and data perturbation. These are the keys for the robust identification of the most predictive features in such data. Within this framework, we propose a simple selective na¨ıve Bayes classifier discovered using a global search technique, and combine it with data perturbation to increase its robustness to small sample sizes. An extensive validation of the method was carried out using two applied datasets from the field of microarrays and a simulated dataset, all confounded by small sample sizes and high dimensionality. The method has been shown capable of identifying genes previously confirmed or associated with prostate cancer and viral infections

Brunel University Research Archive