Search CORE

17 research outputs found

A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data

Author: A Blum
A Tsymbal
Albert Y Zomaya
B Liu
Bing B Zhou
C Ding
C Ooi
D Ruta
G Bontempi
I Inza
IH Witten
J Hua
J Liu
JR Quinlan
JR Quinlan
L Lam
L Li
M Hassan
M Kudo
M Robnik-Šikonja
P Jafari
Pengyi Yang
R Kohavi
RL Somorjai
S Armstrong
S Dudoit
T Golub
T Jirapech-Umpai
T Mitchell
TG Dietterich
U Alon
W Li
X Chen
Y Saeys
Y Saeys
Y Su
Y Wang
YH Yang
Z Zhang
Z Zhang
Zili Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences. <br /

Deakin Research Online

Crossref

Springer - Publisher Connector

PubMed Central

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

Author: A Kikuchi
A Statnikov
A Ultsch
Andrew Harrison
Aris Perperoglou
Asma Gul
B Lausen
Berthold Lausen
C Cortes
C Ding
C Ma
C Müssel
C Zou
D Apiletti
D Apiletti
DA Notterman
DeAndresSA Díaz‐Uriarte R
DG Altman
E Baralis
GJ Gordon
H Peng
H‐C Liu
J Fan
J Fan
J Lu
K‐H Chen
L Breiman
L Breiman
L Lausser
M Dramiński
M Marczyk
Metodi V Metodiev
N De Jay
Osama Mahmoud
P Alhopuro
P Laiho
RN Jorissen
RS Croner
RS Croner
S Chiaretti
S Michiels
T Cover
T Jirapech‐Umpai
TR Golub
VG Tusher
W Talloen
Y Saeys
Y Su
Zardad Khan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes

University of Essex Research Repository

Crossref

Springer - Publisher Connector

PubMed Central

Explore Bristol Research

Gene selection for classification of microarray data based on the Bayes error

Author: A Ben-Dor
A Statnikov
AA Alizadeh
AL Blum
AR Webb
C Ambroise
C Ding
C Gentile
C Lai
C Lee
CF Aliferis
CH Ooi
D Singh
E Xing
EK Tang
F Goudail
G Carneiro
G Kohavi
GR Xuan
HC Peng
Hong-Wen Deng
I Tssamardinos
J Hua
J Khan
J Weston
Ji-Gang Zhang
JW Lee
K Fukunaga
K Tumer
K Yang
KY Yeung
L Devroye
L Yu
M Chow
M Dash
M Dettling
M Dettling
M Wang
M Xiong
MA Shipp
P Baldi
PA Devijver
R Blanco
R Diaz-Uriarte
R Diaz-Uriarte
R Schalkhoff
RO Duda
S Dudoit
S Mukherjee
S Singh
S Varma
T Golub
T Jirapech-Umpai
T Li
TH Bo
U Alon
X Liu
Y Lee
Y Li
ZY Wang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background With DNA microarray data, selecting a compact subset of discriminative genes from thousands of genes is a critical step for accurate classification of phenotypes for, e.g., disease diagnosis. Several widely used gene selection methods often select top-ranked genes according to their individual discriminative power in classifying samples into distinct categories, without considering correlations among genes. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analyses. Some latest studies show that incorporating gene to gene correlations into gene selection can remove redundant genes and improve classification accuracy. Results In this study, we propose a new method, Based Bayes error Filter (BBF), to select relevant genes and remove redundant genes in classification analyses of microarray data. The effectiveness and accuracy of this method is demonstrated through analyses of five publicly available microarray datasets. The results show that our gene selection method is capable of achieving better accuracies than previous studies, while being able to effectively select relevant genes, remove redundant genes and obtain efficient and small gene sets for sample classification purposes. Conclusion The proposed method can effectively identify a compact set of genes with high classification accuracy. This study also indicates that application of the Bayes error is a feasible and effective wayfor removing redundant genes in gene selection.</p

University of Missouri: MOspace

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Identification of Single- and Multiple-Class Specific Signature Genes from Gene Expression Profiles by Group Marker Index

Author: A Bhattacharjee
A Rocchi
A Yoshimura
AM Martoglio
AM Patel
AS Kostyukova
BJ McHugh
C Han
CH Ooi
CT Yap
D Nadano
DH Campbell
DW Huang
DW Huang
E Davicioni
EB Huerta
F Chiarini
G Agatha
G Gerlitz
H Ishii
H Watanabe
I Aifantis
I Guyon
I-Fang Chung
IM Depaz
J Khan
J Khan
JA Cancelas
JR Downing
K Baird
K Kuroda
K Mengubas
K Scotlandi
Kripamoy Aguan
L Li
L Martins
L Sun
L Zhang
M Bustin
M Kanehisa
M Kanehisa
M Kanehisa
M Linial
M Maekawa
M Salagierski
M Wang
M Yousef
M Yousef
ME Atz
MS Lan
N Yamashita
NH Bishopric
Nikhil R. Pal
NK Mukhopadhyay
NR Pal
P Pavlidis
PA Zweidler-McKay
Q Liu
R Fernández-Chacón
R Fiancette
R Hulshizer
R Nahar
R Opgen-Rhein
RJ van Alphen
S Dudoit
S Niijima
S Ocak
S Seo
S Tavor
SA Armstrong
SL Pomeroy
Sumitra Deb
T Jirapech-Umpai
T Tian
TR Golub
V Cerisano
V Zuber
VG Tusher
VI Taylor JG
WD Liu
WG Dilley
WZ Ren
X Zhou
XX Liu
Y Gu
Y Gu
Y Saeys
Y Yu
YS Tsai
Yu-Shuen Tsai
Ø Bruserud
Publication venue: Public Library of Science
Publication date: 01/09/2011
Field of study

Informative genes from microarray data can be used to construct prediction model and investigate biological mechanisms. Differentially expressed genes, the main targets of most gene selection methods, can be classified as single- and multiple-class specific signature genes. Here, we present a novel gene selection algorithm based on a Group Marker Index (GMI), which is intuitive, of low-computational complexity, and efficient in identification of both types of genes. Most gene selection methods identify only single-class specific signature genes and cannot identify multiple-class specific signature genes easily. Our algorithm can detect de novo certain conditions of multiple-class specificity of a gene and makes use of a novel non-parametric indicator to assess the discrimination ability between classes. Our method is effective even when the sample size is small as well as when the class sizes are significantly different. To compare the effectiveness and robustness we formulate an intuitive template-based method and use four well-known datasets. We demonstrate that our algorithm outperforms the template-based method in difficult cases with unbalanced distribution. Moreover, the multiple-class specific genes are good biomarkers and play important roles in biological pathways. Our literature survey supports that the proposed method identifies unique multiple-class specific marker genes (not reported earlier to be related to cancer) in the Central Nervous System data. It also discovers unique biomarkers indicating the intrinsic difference between subtypes of lung cancer. We also associate the pathway information with the multiple-class specific signature genes and cross-reference to published studies. We find that the identified genes participate in the pathways directly involved in cancer development in leukemia data. Our method gives a promising way to find genes that can involve in pathways of multiple diseases and hence opens up the possibility of using an existing drug on other diseases as well as designing a single drug for multiple diseases

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A comparative study of improvements Pre-filter methods bring on feature selection using microarray data

Author: A Assawamakin
A Linden
AE Teschendorff
B Duval
Biomarkers Definitions Working Group
C Li
CF Schaefer
D Croft
D Nishimura
D Sayed
E Martinez
F Azuaje
F Rapaport
HW Lee
I Arisi
I Eisenberg
I Inza
JM Predmore
KG Becker
L Cao
M Ashburner
M Hilario
M Kanehisa
M Rebhan
M Watson
MA Schaub
MA Shi
N Bandyopadhyay
P Baldi
P Jafari
P Malekar
P Wei
Q Wang
R Diaz-Uriarte
R Edgar
S Aerts
S Hibino
S Ma
SB Cho
SD Hsu
T Jirapech-Umpai
U Maulik
V Aguiar-Pulido
W Zhou
X He
X Ma
Y Cai
Y Fang
Y Saeys
Y Wang
YQ Qiu
Z Wei
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A novel information theory method for filter feature selection

Author: A. Hyvarinen
A. Mokkadem
A.O. Hero
A.O. Hero
C. Sima
D.J. Bertsimas
E. Beirlant
I. Guyon
K. Zyczkowski
P. Viola
T. Cover
T. Jirapech-Umpai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In this paper, we propose a novel filter for feature selection. Such filter relies on the estimation of the mutual information between features and classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Rényi entropy, and the subsequent approximation of the Shannon one. The complexity of such bypassing process does not depend on the number of dimensions but on the number of patterns/samples, and thus the curse of dimensionality is circumvented. We show that it is then possible to outperform a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification.This research is funded by the project DPI2005-01280 from the Spanish Government

Repositorio Institucional de la Universidad de Alicante

Crossref

ON GENE SELECTION AND CLASSIFICATION FOR CANCER MICROARRAY DATA USING MULTI-STEP CLUSTERING AND SPARSE REPRESENTATION

Author: Attwood T.
Bolmont C.
De R.
Golub T.
Jirapech-Umpai T.
Kishino H.
LIPING JING
MICHAEL K. NG
Mitchell T.
Scholkopf B.
Shakhnarovich G.
Tibshirani R.
TIEYONG ZENG
Unger G.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref

Advances in metaheuristics for gene selection and classification of microarray data

Author: Alizadeh
Alon
B. Duval
Braga-Neto
Butterfield
Dimaggio
Furey
Golub
J.-K. Hao
Jha
Jirapech-Umpai
Li
Madeira
Ooi
Peng
Petricoin
Pomeroy
Rapaport
Simon
Singh
Stoughton
Su
van 't Veer
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique

Author: AM Porras
C Lazar
C Shang
Douglas Rodrigues
G Chandrashekar
G Mahdevar
H Uğuz
İ İlhan
L Gao
L Li
LY Chuang
N Kwak
Olympia Roeva
P Larrañaga
RF Harrison
RF Harrison
S Chen
S Ishihara
S Ponsuksili
T Jirapech-Umpai
W Burgos-Paz
W Rathasamuth
Y Saeys
Y Wang
Y Xu
YS Lee
Zena M. Hira
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data

Author: A Boulesteix
A Boulesteix
A DiLella
A Jayasurya
A Naderi
A Tan
AA Shabalin
AL Boulesteix
B Chetaille
B Lin
B Lin
C Ambroise
C Dani
C Gini
C Ha
C Stephan
C Zhang
Christos A. Ouzounis
D Gerhold
D Singh
E Glaab
E Glaab
E Glaab
E Glaab
E Glaab
E Mosca
Enrico Glaab
F Divina
G Alexe
G Bassel
G Bray
G Dennis Jr
G Fasola
G Vanpoucke
G Venturini
GW Bassel
H Acuff
H Li
H Wold
H Zhang
HO Habashy
HO Habashy
I Guyon
I Wood
J Aguilar-Ruiz
J Bacardit
J Bacardit
J Bacardit
J Bacardit
J Bacardit
J Bacardit
J Bacardit
J Demšar
J Flier
J Graham
J Habeshaw
J Holland
J Lee
J Liu
J Luo
J Luo
J Magee
J Moore
J Rissanen
J Shu
Jaume Bacardit
JM Deutsch
Jonathan M. Garibaldi
K Birkenkamp-Demtroder
K Moestrup
K Nicodemus
L Breiman
L Coussens
L Goh
L Jeffrey Medeiros
L Li
L Shen
L Wessels
M Calle
M Lecocke
M Nakashima
M Rebhan
M Shipp
M Stojanov
M Stout
M Stout
M Toyota
M Zervakis
MA Hall
MB Kursa
Natalio Krasnogor
O Klezovitch
P Kamper
P Ma
P Warnat
R Bende
R Blanco
R Diaz-Uriarte
R Fano
R Gilbert
R Kuefer
R Rivest
R Tibshirani
R Wolfinger
S Chin
S de Jong
S Dhanasekaran
S Esseghir
S Ferdinandusse
S Li
S Pileri
S Theocharis
T Daniels
T Dietterich
T Furey
T Hastie
T Jirapech-Umpai
T Nielsen
T Paul
U Braga-Neto
V Popovici
V Srikantan
V Vapnik
W Chu
W Li
W Sheng
WJ Conover
Y Hu
Y Saeys
Y Sun
Y Tokuda
Y Tsuruoka
Z Chen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes

Nottingham ePrints

CiteSeerX

Public Library of Science (PLOS)

Nottingham eTheses

Crossref

Repository@Nottingham

Directory of Open Access Journals

PubMed Central

Open Repository and Bibliography - Luxembourg

FigShare