Search CORE

60 research outputs found

Multiclass microarray gene expression classification based on fusion of correlation features

Author: Chetty Girija
Chetty Madhu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

In this paper, we propose novel algorithmic models based on fusion of independent and correlated gene features for multiclass microarray gene expression classification. It is possible for genes to get co-expressed via different pathways. Moreover, a gene may or may not be co-active for all samples. In this paper, we approach this problem with a optimal feature selection technique using analysis based on statistical techniques to model the complex interactions between genes. The two different types of correlation modelling techniques based on the cross modal factor analysis (CFA) and canonical correlation analysis (CCA) were examined. The subsequent fusion of CCA/CFA features with principal component analysis (PCA) features at feature-level, and at score-level result in significant enhancement in classification accuracy for different data sets corresponding to multiclass microarray gene expression data

University of Canberra Research Repository

Federation ResearchOnline

Gene selection for classification of microarray data based on the Bayes error

Author: A Ben-Dor
A Statnikov
AA Alizadeh
AL Blum
AR Webb
C Ambroise
C Ding
C Gentile
C Lai
C Lee
CF Aliferis
CH Ooi
D Singh
E Xing
EK Tang
F Goudail
G Carneiro
G Kohavi
GR Xuan
HC Peng
Hong-Wen Deng
I Tssamardinos
J Hua
J Khan
J Weston
Ji-Gang Zhang
JW Lee
K Fukunaga
K Tumer
K Yang
KY Yeung
L Devroye
L Yu
M Chow
M Dash
M Dettling
M Dettling
M Wang
M Xiong
MA Shipp
P Baldi
PA Devijver
R Blanco
R Diaz-Uriarte
R Diaz-Uriarte
R Schalkhoff
RO Duda
S Dudoit
S Mukherjee
S Singh
S Varma
T Golub
T Jirapech-Umpai
T Li
TH Bo
U Alon
X Liu
Y Lee
Y Li
ZY Wang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background With DNA microarray data, selecting a compact subset of discriminative genes from thousands of genes is a critical step for accurate classification of phenotypes for, e.g., disease diagnosis. Several widely used gene selection methods often select top-ranked genes according to their individual discriminative power in classifying samples into distinct categories, without considering correlations among genes. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analyses. Some latest studies show that incorporating gene to gene correlations into gene selection can remove redundant genes and improve classification accuracy. Results In this study, we propose a new method, Based Bayes error Filter (BBF), to select relevant genes and remove redundant genes in classification analyses of microarray data. The effectiveness and accuracy of this method is demonstrated through analyses of five publicly available microarray datasets. The results show that our gene selection method is capable of achieving better accuracies than previous studies, while being able to effectively select relevant genes, remove redundant genes and obtain efficient and small gene sets for sample classification purposes. Conclusion The proposed method can effectively identify a compact set of genes with high classification accuracy. This study also indicates that application of the Bayes error is a feasible and effective wayfor removing redundant genes in gene selection.</p

University of Missouri: MOspace

Springer - Publisher Connector

Directory of Open Access Journals

Machine Learning Approaches for Breast Cancer Survivability Prediction

Author: Pham Quang Huy
Publication venue: 'University of Windsor Leddy Library'
Publication date: 07/07/2020
Field of study

Breast cancer is one of the leading causes of cancer death in women. If not diagnosed early, the 5-year survival rate of patients is just about 26\%. Furthermore, patients with similar phenotypes can respond differently to the same therapies, which means the therapies might not work well for some of them. Identifying biomarkers that can help predict a cancer class with high accuracy is at the heart of breast cancer studies because they are targets of the treatments and drug development. Genomics data have been shown to carry useful information for breast cancer diagnosis and prognosis, as well as uncovering the disease’s mechanism. Machine learning methods are powerful tools to find such information. Feature selection methods are often utilized in supervised learning and unsupervised learning tasks to deal with data containing a large number of features in which only a small portion of them are useful to the classification task. On the other hand, analyzing only one type of data, without reference to the existing knowledge about the disease and the therapies, might mislead the findings. Effective data integration approaches are necessary to uncover this complex disease. In this thesis, we apply and develop machine learning methods to identify meaningful biomarkers for breast cancer survivability prediction after a certain treatment. They include applying feature selection methods on gene-expression data to derived gene-signatures, where the initial genes are collected concerning the mechanism of some drugs used breast cancer therapies. We also propose a new feature selection method, named PAFS, and apply it to discover accurate biomarkers. In addition, it has been increasingly supported that, sub-network biomarkers are more robust and accurate than gene biomarkers. We proposed two network-based approaches to identify sub-network biomarkers for breast cancer survivability prediction after a treatment. They integrate gene-expression data with protein-protein interactions during the optimal sub-network searching process and use cancer-related genes and pathways to prioritize the extracted sub-networks. The sub-network search space is usually huge and many proteins interact with thousands of other proteins. Thus, we apply some heuristics to avoid generating and evaluating redundant sub-networks