Search CORE

69 research outputs found

Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data

Author: Jung Segun
Showe Louise C
Showe Michael K
Yousef Malik
Publication venue: BioMed Central
Publication date: 01/05/2007
Field of study

Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs), a supervised machine learning classification method, to identify and score (rank) those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE) is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA) with recursive feature elimination (SVM-RFE and PDA-RFE) are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together provide greater insight into the structure of the microarray data. Clustering genes for classification appears to result in some concomitant clustering of samples into subgroups. Our present implementation of SVM-RCE groups genes using the correlation metric. The success of the SVM-RCE method in classification suggests that gene interaction networks or other biologically relevant metrics that group genes based on functional parameters might also be useful. <p/

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Learning from positive examples when the negative class is undetermined- microRNA gene identification

Author: Jung Segun
Showe Louise C
Showe Michael K
Yousef Malik
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using naïve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species. Results Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70–80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than a two-class approach in identifying true miRNAs as well as predicting new miRNAs. Conclusion One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined. Availability The OneClassmiRNA program is available at: <abbrgrp><abbr bid="B1">1</abbr></abbrgrp></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Classification and Prediction of Survival in Patients with the Leukemic Phase of Cutaneous T Cell Lymphoma

Author: Chang Celia
Horng Wen-Hwai
Johnston James
Kari Laszlo
Loboda Andrey
Nebozhyn Michael
Nichols Calen
Rook Alain H.
Showe Louise C.
Showe Michael K.
Virok Dezso
Vonderheid Eric C.
Wysocka Maria
Publication venue: The Rockefeller University Press
Publication date: 01/01/2003
Field of study

We have used cDNA arrays to investigate gene expression patterns in peripheral blood mononuclear cells from patients with leukemic forms of cutaneous T cell lymphoma, primarily Sezary syndrome (SS). When expression data for patients with high blood tumor burden (Sezary cells >60% of the lymphocytes) and healthy controls are compared by Student's t test, at P < 0.01, we find 385 genes to be differentially expressed. Highly overexpressed genes include Th2 cells–specific transcription factors Gata-3 and Jun B, as well as integrin β1, proteoglycan 2, the RhoB oncogene, and dual specificity phosphatase 1. Highly underexpressed genes include CD26, Stat-4, and the IL-1 receptors. Message for plastin-T, not normally expressed in lymphoid tissue, is detected only in patient samples and may provide a new marker for diagnosis. Using penalized discriminant analysis, we have identified a panel of eight genes that can distinguish SS in patients with as few as 5% circulating tumor cells. This suggests that, even in early disease, Sezary cells produce chemokines and cytokines that induce an expression profile in the peripheral blood distinctive to SS. Finally, we show that using 10 genes, we can identify a class of patients who will succumb within six months of sampling regardless of their tumor burden

CiteSeerX

Crossref

PubMed Central

A Novel Cross-Disciplinary Multi-Institute Approach to Translational Cancer Research: Lessons Learned from Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC)

Author: Becich Michael J.
Beck J. Robert
Carver Joseph
Dhir Rajiv
Garcia Fernando U.
Gilbertson John R.
Herberman Ronald B.
Lazarus Andrea
Liebman Michael
London Jack W.
Ochs Michael F.
Parwani Anil V.
Patel Ashokkumar A.
Prichard Jeff
Ross Eric
Showe Louise C.
Wilkerson Myra
Publication venue: Libertas Academica
Publication date: 01/01/2007
Field of study

Background: The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC, http://www.pcabc.upmc.edu) is one of the first major project-based initiatives stemming from the Pennsylvania Cancer Alliance that was funded for four years by the Department of Health of the Commonwealth of Pennsylvania. The objective of this was to initiate a prototype biorepository and bioinformatics infrastructure with a robust data warehouse by developing a statewide data model (1) for bioinformatics and a repository of serum and tissue samples; (2) a data model for biomarker data storage; and (3) a public access website for disseminating research results and bioinformatics tools. The members of the Consortium cooperate closely, exploring the opportunity for sharing clinical, genomic and other bioinformatics data on patient samples in oncology, for the purpose of developing collaborative research programs across cancer research institutions in Pennsylvania. The Consortium’s intention was to establish a virtual repository of many clinical specimens residing in various centers across the state, in order to make them available for research. One of our primary goals was to facilitate the identification of cancer specific biomarkers and encourage collaborative research efforts among the participating centers.Methods: The PCABC has developed unique partnerships so that every region of the state can effectively contribute and participate. It includes over 80 individuals from 14 organizations, and plans to expand to partners outside the State. This has created a network of researchers, clinicians, bioinformaticians, cancer registrars, program directors, and executives from academic and community health systems, as well as external corporate partners - all working together to accomplish a common mission. The various sub-committees have developed a common IRB protocol template, common data elements for standardizing data collections for three organ sites, intellectual property/tech transfer agreements, and material transfer agreements that have been approved by each of the member institutions. This was the foundational work that has led to the development of a centralized data warehouse that has met each of the institutions’ IRB/HIPAA standards.Results: Currently, this “virtual biorepository” has over 58,000 annotated samples from 11,467 cancer patients available for research purposes. The clinical annotation of tissue samples is either done manually over the internet or semiautomated batch modes through mapping of local data elements with PCABC common data elements. The database currently holds information on 7188 cases (associated with 9278 specimens and 46,666 annotated blocks and blood samples) of prostate cancer, 2736 cases (associated with 3796 specimens and 9336 annotated blocks and blood samples) of breast cancer and 1543 cases (including 1334 specimens and 2671 annotated blocks and blood samples) of melanoma. These numbers continue to grow, and plans to integrate new tumor sites are in progress. Furthermore, the group has also developed a central web-based tool that allows investigators to share their translational (genomics/proteomics) experiment data on research evaluating potential biomarkers via a central location on the Consortium’s web site.Conclusions: The technological achievements and the statewide informatics infrastructure that have been established by the Consortium will enable robust and efficient studies of biomarkers and their relevance to the clinical course of cancer. Studies resulting from the creation of the Consortium may allow for better classification of cancer types, more accurate assessment of disease prognosis, a better ability to identify the most appropriate individuals for clinical trial participation, and better surrogate markers of disease progression and/or response to therapy

Directory of Open Access Journals

PubMed Central

Peripheral Immune Cell Gene Expression Predicts Survival of Patients with Non-Small Cell Lung Cancer

Author: A Chen
A Jemal
A Stachon
A Subramanian
AD Gregory
Andrew V. Kossenkov
Anil Vachani
AR Abbas
AR Abbas
AV Kossenkov
C Diaz-Montero
D Pardoll
DI Gabrilovich
E van den Akker
G Chen
H Kadara
J Schmielau
J Shen
J Subramanian
J Varlotto
JD Storey
JJ Goeman
John C. Kucharczuk
Louise C. Showe
M Boeri
M Gonen
M Yamada
Michael K. Showe
MK Showe
Noor Dawany
QT Le
Rui Medeiros
S Dubey
S Trellakis
Steven M. Albelda
Tracey L. Evans
V Leone
VK Mootha
W Huang da
Z Hu
Publication venue: Public Library of Science
Publication date: 29/03/2012
Field of study

Prediction of cancer recurrence in patients with non-small cell lung cancer (NSCLC) currently relies on the assessment of clinical characteristics including age, tumor stage, and smoking history. A better prediction of early stage cancer patients with poorer survival and late stage patients with better survival is needed to design patient-tailored treatment protocols. We analyzed gene expression in RNA from peripheral blood mononuclear cells (PBMC) of NSCLC patients to identify signatures predictive of overall patient survival. We find that PBMC gene expression patterns from NSCLC patients, like patterns from tumors, have information predictive of patient outcomes. We identify and validate a 26 gene prognostic panel that is independent of clinical stage. Many additional prognostic genes are specific to myeloid cells and are more highly expressed in patients with shorter survival. We also observe that significant numbers of prognostic genes change expression levels in PBMC collected after tumor resection. These post-surgery gene expression profiles may provide a means to re-evaluate prognosis over time. These studies further suggest that patient outcomes are not solely determined by tumor gene expression profiles but can also be influenced by the immune response as reflected in peripheral immune cells

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Classification and biomarker identification using gene network modules and support vector machines

Author: A Djebbari
A Spira
BS Srinivasan
D Kai-Bo
D Reiss
D Zhu
F Li
H Pang
I Guyon
I Inza
L Kari
Larry Manevitz
Louise C Showe
M Nebozhyn
M Yousef
Malik Yousef
Michael K Showe
Mohamed Ketany
PvS Eugene
R Bonneau
R Kohavi
RJ Critchley-Thorne
S Nacu
T Ideker
T Li
W Pan
X Yang
X Zhang
X-w Chen
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes. We now demonstrate that an algorithm which integrates network information with recursive feature elimination based on SVM exhibits good performance and improves the biological interpretability of the results. We refer to the method as SVM with Recursive Network Elimination (SVM-RNE) Results Initially, one thousand genes selected by t-test from a training set are filtered so that only genes that map to a gene network database remain. The Gene Expression Network Analysis Tool (GXNA) is applied to the remaining genes to form <it>n </it>clusters of genes that are highly connected in the network. Linear SVM is used to classify the samples using these clusters, and a weight is assigned to each cluster based on its importance to the classification. The least informative clusters are removed while retaining the remainder for the next classification step. This process is repeated until an optimal classification is obtained. Conclusion More than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays. The Matlab version of SVM-RNE can be downloaded from <url>http://web.macam.ac.il/~myousef</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

An integrative ChIP-chip and gene expression profiling to model SMAD regulatory modules

Author: Agosto-Perez Francisco J
Balch Curtis
Chan Michael WY
Cheng Alfred SL
Davuluri Ramana V
Huang Tim HM
Lin Huey-Jen
Liyanarachchi Sandya
Nephew Kenneth P
Nikonova Elena V
Potter Dustin
Qin Huaxia
Saltz Joel H
Showe Louise C
Souriraj Irene J
Yan Pearlly S
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The TGF-β/SMAD pathway is part of a broader signaling network in which crosstalk between pathways occurs. While the molecular mechanisms of TGF-β/SMAD signaling pathway have been studied in detail, the global networks downstream of SMAD remain largely unknown. The regulatory effect of SMAD complex likely depends on transcriptional modules, in which the SMAD binding elements and partner transcription factor binding sites (SMAD modules) are present in specific context. Results To address this question and develop a computational model for SMAD modules, we simultaneously performed chromatin immunoprecipitation followed by microarray analysis (ChIP-chip) and mRNA expression profiling to identify TGF-β/SMAD regulated and synchronously coexpressed gene sets in ovarian surface epithelium. Intersecting the ChIP-chip and gene expression data yielded 150 direct targets, of which 141 were grouped into 3 co-expressed gene sets (sustained up-regulated, transient up-regulated and down-regulated), based on their temporal changes in expression after TGF-β activation. We developed a data-mining method driven by the Random Forest algorithm to model SMAD transcriptional modules in the target sequences. The predicted SMAD modules contain SMAD binding element and up to 2 of 7 other transcription factor binding sites (E2F, P53, LEF1, ELK1, COUPTF, PAX4 and DR1). Conclusion Together, the computational results further the understanding of the interactions between SMAD and other transcription factors at specific target promoters, and provide the basis for more targeted experimental verification of the co-regulatory modules.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Modulation of gene expression in heart and liver of hibernating black bears (Ursus americanus)

Author: A Subramanian
AM Samarel
Anna V Goropashnaya
BB Boyer
BR Zeeberg
BR Zeeberg
Brian M Barnes
C Shao
Celia Chang
CL Buck
DA Lundberg
DR Williams
EE Dupont-Versteegden
F Geiser
G Pertea
Haifang Wang
HJ Harlow
HV Carey
J Dresios
J Yan
J Yan
JD Storey
Jun Yan
KJ Livak
KM Brauch
L Kari
Louise C Showe
Michael K Showe
MW Pfaffl
Nathan C Stewart
P Carninci
PD Watts
PS Barboza
R Hissa
R Hissa
RA Nelson
RA Nelson
RA Nelson
RL Rausch
S Zhao
T Kislinger
V Chauhan
Vadim B Fedorov
VB Fedorov
YY Zhu
Ø Tøien
Ø Tøien
Øivind Tøien
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Hibernation is an adaptive strategy to survive in highly seasonal or unpredictable environments. The molecular and genetic basis of hibernation physiology in mammals has only recently been studied using large scale genomic approaches. We analyzed gene expression in the American black bear, <it>Ursus americanus</it>, using a custom 12,800 cDNA probe microarray to detect differences in expression that occur in heart and liver during winter hibernation in comparison to summer active animals. Results We identified 245 genes in heart and 319 genes in liver that were differentially expressed between winter and summer. The expression of 24 genes was significantly elevated during hibernation in both heart and liver. These genes are mostly involved in lipid catabolism and protein biosynthesis and include RNA binding protein motif 3 (<it>Rbm3</it>), which enhances protein synthesis at mildly hypothermic temperatures. Elevated expression of protein biosynthesis genes suggests induction of translation that may be related to adaptive mechanisms reducing cardiac and muscle atrophies over extended periods of low metabolism and immobility during hibernation in bears. Coordinated reduction of transcription of genes involved in amino acid catabolism suggests redirection of amino acids from catabolic pathways to protein biosynthesis. We identify common for black bears and small mammalian hibernators transcriptional changes in the liver that include induction of genes responsible for fatty acid β oxidation and carbohydrate synthesis and depression of genes involved in lipid biosynthesis, carbohydrate catabolism, cellular respiration and detoxification pathways. Conclusions Our findings show that modulation of gene expression during winter hibernation represents molecular mechanism of adaptation to extreme environments.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genome-Wide Maps of Circulating miRNA Biomarkers for Ulcerative Colitis

Inflammatory Bowel Disease – comprised of Crohn's Disease and Ulcerative Colitis (UC) - is a complex, multi-factorial inflammatory disorder of the gastrointestinal tract. In this study we have explored the utility of naturally occurring circulating miRNAs as potential blood-based biomarkers for non-invasive prediction of UC incidences. Whole genome maps of circulating miRNAs in micro-vesicles, Peripheral Blood Mononuclear Cells and platelets have been constructed from a cohort of 20 UC patients and 20 normal individuals. Through Significance Analysis of Microarrays, a signature of 31 differentially expressed platelet-derived miRNAs has been identified and biomarker performance estimated through a non-probabilistic binary linear classification using Support Vector Machines. Through this approach, classifier measurements reveal a predictive score of 92.8% accuracy, 96.2% specificity and 89.5% sensitivity in distinguishing UC patients from normal individuals. Additionally, the platelet-derived biomarker signature can be validated at 88% accuracy through qPCR assays, and a majority of the miRNAs in this panel can be demonstrated to sub-stratify into 4 highly correlated intensity based clusters. Analysis of predicted targets of these biomarkers reveal an enrichment of pathways associated with cytoskeleton assembly, transport, membrane permeability and regulation of transcription factors engaged in a variety of regulatory cascades that are consistent with a cell-mediated immune response model of intestinal inflammation. Interestingly, comparison of the miRNA biomarker panel and genetic loci implicated in IBD through genome-wide association studies identifies a physical linkage between hsa-miR-941 and a UC susceptibility loci located on Chr 20. Taken together, analysis of these expression maps outlines a promising catalog of novel platelet-derived miRNA biomarkers of clinical utility and provides insight into the potential biological function of these candidates in disease pathogenesis

Crossref

Directory of Open Access Journals

PubMed Central

FigShare