Search CORE

HKU Scholars Hub

Analysis of cell proliferation and tissue remodelling uncovers a KLF4 activity score associated with poor prognosis in colorectal cancer

Author: A Anjomshoaa
A Loboda
A Subramanian
Alexei Vazquez
AM Newman
BV North
DVF Tauriello
EK Markert
EK Markert
Elke K. Markert
H Han
J Guinney
K Okita
K Takahashi
K Yoshihara
KL Jeffrey
M Pickup
MW Feinberg
PJ Diest van
R Stuart-Harris
Silvia Halim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/10/2018
Field of study

Human cancers can be classified based on gene signatures quantifying the degree of cell proliferation and tissue remodelling (PR). However, the specific factors that drive the increased tissue remodelling in tumours are not fully understood. Here we address this question using colorectal cancer as a case study. We reanalysed a reported cohort of colorectal cancer patients. The patients were stratified based on gene signatures of cell proliferation and tissue remodelling. Putative transcription factors activity was inferred using gene expression profiles and annotations of transcription factor targets as input. We demonstrate that the PR classification performs better than the currently adopted consensus molecular subtyping (CMS). Although CMS classification differentiates patients with a mesenchymal signature, it cannot distinguish the remaining patients based on survival. We demonstrate that the missing factor is cell proliferation, which is indicative of good prognosis. We also uncover a KLF4 transcription factor activity score associated with the tissue remodelling gene signature. We further show that the KLF4 activity score is significantly higher in colorectal tumours with predicted infiltration of cells from the myeloid lineage. The KLF4 activity score is associated with tissue remodelling, myeloid cell infiltration and poor prognosis in colorectal cancer

Public Library of Science (PLOS)

Enlighten

Inferring Pathway Activity toward Precise Disease Classification

Author: A Agresti
A Bhattacharjee
A Subramanian
AA Alizadeh
AH Bild
B Tian
CL Banka
DG Beer
Doheon Lee
E Segal
EJ Yeoh
Eunjung Lee
Greg Tucker-Kellogg
GV Glinsky
Han-Yu Chuang
HY Chuang
J Chen
J Lapointe
JA Swets
Jong-Won Kim
JP Svensson
JP Vert
KM Mani
L Ein-Dor
L Tian
LJ van 't Veer
MJ van de Vijver
P Pavlidis
P Pavlidis
R Sharan
RA Fisher
RA Gatenby
RA Gatenby
S Draghici
S Efroni
S Ramaswamy
SA Tomlins
SS Gambhir
SW Doniger
T Breslin
T Ideker
TR Golub
Trey Ideker
VK Mootha
WF Symmans
Y Saeys
Y Wang
Z Guo
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease

CiteSeerX

Delineation of prognostic biomarkers in prostate cancer

Author: A Tsuji
AA Alizadeh
C Abate-Shen
CM Perou
CR Pound
DF Gleason
E Ruijter
EE Perrone
J Elek
J Kononen
K Tomita
L Liotta
M Bittner
MA Rubin
MB Eisen
MJ Barry
MR Emmert-Buck
MS Shurbaji
R Buttyan
TR Golub
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/08/2001
Field of study

Prostate cancer is the most frequently diagnosed cancer in American men(1,2). Screening for prostate-specific antigen (PSA) has led to earlier detection of prostate cancer(3), but elevated serum PSA levels may be present in non-malignant conditions such as benign prostatic hyperlasia (BPH). Characterization of gene-expression profiles that molecularly distinguish prostatic neoplasms may identify genes involved in prostate carcinogenesis, elucidate clinical biomarkers, and lead to an improved classification of prostate cancer(4-6). Using microarrays of complementary DNA, we examined gene-expression profiles of more than 50 normal and neoplastic prostate specimens and three common prostate-cancer cell lines. Signature expression profiles of normal adjacent prostate (NAP), BPH, localized prostate cancer, and metastatic, hormone-refractory prostate cancer were determined. Here we establish many associations between genes and prostate cancer. We assessed two of these genes-hepsin, a transmembrane serine protease, and pim-1, a serine/threonine kinase-at the protein level using tissue microarrays consisting of over 700 clinically stratified prostate-cancer specimens. Expression of hepsin and pim-1 proteins was significantly correlated with measures of clinical outcome. Thus, the integration of cDNA microarray, high-density tissue microarray, and linked clinical and pathology data is a powerful approach to molecular profiling of human cancer.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62849/1/412822a0.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Integration of Machine Learning Methods to Dissect Genetically Imputed Transcriptomic Profiles in Alzheimer's Disease.

Author: Alzheimer’s Disease Neuroimaging Initiative
Azevedo Tiago
Borisov Oleg
Dimitri Giovanna Maria
Giansanti Valentina
Lió Pietro
Maj Carlo
Merelli Ivan
Spasov Simeon
Publication venue: Frontiers in Genetics
Publication date: 01/01/2019
Field of study

The genetic component of many common traits is associated with the gene expression and several variants act as expression quantitative loci, regulating the gene expression in a tissue specific manner. In this work, we applied tissue-specific cis-eQTL gene expression prediction models on the genotype of 808 samples including controls, subjects with mild cognitive impairment, and patients with Alzheimer's Disease. We then dissected the imputed transcriptomic profiles by means of different unsupervised and supervised machine learning approaches to identify potential biological associations. Our analysis suggests that unsupervised and supervised methods can provide complementary information, which can be integrated for a better characterization of the underlying biological system. In particular, a variational autoencoder representation of the transcriptomic profiles, followed by a support vector machine classification, has been used for tissue-specific gene prioritizations. Interestingly, the achieved gene prioritizations can be efficiently integrated as a feature selection step for improving the accuracy of deep learning classifier networks. The identified gene-tissue information suggests a potential role for inflammatory and regulatory processes in gut-brain axis related tissues. In line with the expected low heritability that can be apportioned to eQTL variants, we were able to achieve only relatively low prediction capability with deep learning classification models. However, our analysis revealed that the classification power strongly depends on the network structure, with recurrent neural networks being the best performing network class. Interestingly, cross-tissue analysis suggests a potentially greater role of models trained in brain tissues also by considering dementia-related endophenotypes. Overall, the present analysis suggests that the combination of supervised and unsupervised machine learning techniques can be used for the evaluation of high dimensional omics data.Includes EPSRC

Apollo (Cambridge)

Classification of Genes and Putative Biomarker Identification Using Distribution Metrics on Expression Profiles

Author: Daniel Jupiter
Hung-Chung Huang
Timothy Ravasi
Vincent VanBuren
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

BACKGROUND: Identification of genes with switch-like properties will facilitate discovery of regulatory mechanisms that underlie these properties, and will provide knowledge for the appropriate application of Boolean networks in gene regulatory models. As switch-like behavior is likely associated with tissue-specific expression, these gene products are expected to be plausible candidates as tissue-specific biomarkers. METHODOLOGY/PRINCIPAL FINDINGS: In a systematic classification of genes and search for biomarkers, gene expression profiles (GEPs) of more than 16,000 genes from 2,145 mouse array samples were analyzed. Four distribution metrics (mean, standard deviation, kurtosis and skewness) were used to classify GEPs into four categories: predominantly-off, predominantly-on, graded (rheostatic), and switch-like genes. The arrays under study were also grouped and examined by tissue type. For example, arrays were categorized as 'brain group' and 'non-brain group'; the Kolmogorov-Smirnov distance and Pearson correlation coefficient were then used to compare GEPs between brain and non-brain for each gene. We were thus able to identify tissue-specific biomarker candidate genes. CONCLUSIONS/SIGNIFICANCE: The methodology employed here may be used to facilitate disease-specific biomarker discovery

Public Library of Science (PLOS)

International Institute for Science, Technology and Education (IISTE): E-Journals

Texas A&M Repository

Multiclass Sequential Feature Selection and Classification Method for Genomic Data

Author: Aremu G. T.
Garba M. K.
Yahya W. B.
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 11/01/2017
Field of study

This paper presents an efficient multiclass sequential feature selection and classification (mk-SS) method using gene expression signatures. The development of this method employs 10-fold cross-validation to ensure stability. The efficiency of this method is assessed through the misclassification error rate and some other performance measures. The performances of the mk-SS were compared with the classification results of the Support Vector Machines (SVM) over five published multiclass microarray datasets. The results showed that the mk-SS method efficiently selects the informative gene biomarkers for proper classification of the biological groups of the tissue samples. This method competes favourably with SVM in terms of prediction accuracy while it outperforms the SVM in 80% of cases considered. The quality of the features selected by mk-SS algorithm was validated by hybridizing the feature selection scheme of the mk-SS into the standard SVM algorithm which significantly improves the predictive power of the standard SVM method. This work has shown that classification of various cancer type using gene expression profiles is feasible especially when the endpoints are of multi-category. Keywords: k-SS, mk-SS, Support Vector Machines, Microarray, Misclassification error rat

Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data

Author: Bean R.
Do K-A.
McLachlan G.J.
Wen S.
Publication venue: Libertas Academica
Publication date: 01/01/2007
Field of study

Researchers are frequently faced with the analysis of microarray data of a relatively large number of genes using a small number of tissue samples. We examine the application of two statistical methods for clustering such microarray expression data: EMMIX-GENE and GeneClust. EMMIX-GENE is a mixture-model based clustering approach, designed primarily to cluster tissue samples on the basis of the genes. GeneClust is an implementation of the gene shaving methodology, motivated by research to identify distinct sets of genes for which variation in expression could be related to a biological property of the tissue samples. We illustrate the use of these two methods in the analysis of Affymetrix oligonucleotide arrays of well-known data sets from colon tissue samples with and without tumors, and of tumor tissue samples from patients with leukemia. Although the two approaches have been developed from different perspectives, the results demonstrate a clear correspondence between gene clusters produced by GeneClust and EMMIX-GENE for the colon tissue data. It is demonstrated, for the case of ribosomal proteins and smooth muscle genes in the colon data set, that both methods can classify genes into co-regulated families. It is further demonstrated that tissue types (tumor and normal) can be separated on the basis of subtle distributed patterns of genes. Application to the leukemia tissue data produces a division of tissues corresponding closely to the external classification, acute myeloid meukemia (AML) and acute lymphoblastic leukemia (ALL), for both methods. In addition, we also identify genes specific for the subgroup of ALL-Tcell samples. Overall, we find that the gene shaving method produces gene clusters at great speed; allows variable cluster sizes and can incorporate partial or full supervision; and finds clusters of genes in which the gene expression varies greatly over the tissue samples while maintaining a high level of coherence between the gene expression profiles. The intent of the EMMIX-GENE method is to cluster the tissue samples. It performs a filtering step that results in a subset of relevant genes, followed by gene clustering, and then tissue clustering, and is favorable in its accuracy of ranking the clusters produced