Search CORE

Cold Spring Harbor Laboratory Institutional Repository

CERN Document Server

Finding gene clusters for a replicated time course study

Author: Li-Xuan Qin
Linda Breeden
Steven G Self
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. FINDINGS: In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. CONCLUSIONS: The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism

Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study

Author: Aasheim Hans Christian
Delabie Jan
Myklebost Ola
Smeland Erlend
Wang Junbai
Publication venue: BioMed Central
Publication date: 01/01/2002
Field of study

BACKGROUND: A method to evaluate and analyze the massive data generated by series of microarray experiments is of utmost importance to reveal the hidden patterns of gene expression. Because of the complexity and the high dimensionality of microarray gene expression profiles, the dimensional reduction of raw expression data and the feature selections necessary for, for example, classification of disease samples remains a challenge. To solve the problem we propose a two-level analysis. First self-organizing map (SOM) is used. SOM is a vector quantization method that simplifies and reduces the dimensionality of original measurements and visualizes individual tumor sample in a SOM component plane. Next, hierarchical clustering and K-means clustering is used to identify patterns of gene expression useful for classification of samples. RESULTS: We tested the two-level analysis on public data from diffuse large B-cell lymphomas. The analysis easily distinguished major gene expression patterns without the need for supervision: a germinal center-related, a proliferation, an inflammatory and a plasma cell differentiation-related gene expression pattern. The first three patterns matched the patterns described in the original publication using supervised clustering analysis, whereas the fourth one was novel. CONCLUSIONS: Our study shows that by using SOM as an intermediate step to analyze genome-wide gene expression data, the gene expression patterns can more easily be revealed. The "expression display" by the SOM component plane summarises the complicated data in a way that allows the clinician to evaluate the classification options rather than giving a fixed diagnosis

NORA - Norwegian Open Research Archives

Clustering Algorithms: Their Application to Gene Expression Data

Author: Agrawal R.
Alizadeh A.A.
Bandyopadhyay S.
Bandyopadhyay S.
Bezdek J.C.
Bezdek J.C.
Bezdek† J.C.
Bhargavi M.S.
Blatt M.
Bochkov Y.A.
Brunet J.P.
Bryan K.
Buitinck L.
Bunnik E.M.
Caliński T.
Chandrasekhar T.
Cheng Y.
Costa I.G.
Cover T.M.
D'haeseleer P.
Dave R.N.
Davies D.L.
De Morsier F.
Dempster A.P.
Dharmarajan A.
Dhillon I.S.
Divina F.
Do C.B.
Domany E.
Du Z.
Dunn† J.C.
Edla D.R.
Eisen M.B.
Ferguson T.S.
Frey B.J.
Fu L.
Fukuyama Y.
Galluccio L.
Gath I.
Getz G.
Gordon G.J.
Gu J.
Guha S.
Handhayani T.
Handl J.
Hatamlou A.
Heard N.A.
Heyer L.J.
Hinneburg A.
Hinneburg A.
Hu X.
Hubert L.J.
Jain A.K.
Jiang D.
Jiang H.
Joopudi S.
Kao Y.T.
Karmilasari S.W.
Karypis G.
Kaufman L.
Kerr G.
Kluger Y.
Kohonen T.
Kohonen T.
Krzanowski W.J.
Leone M.
Lu Y.
Lu Y.
Ma'sum M.A.
MacQueen J.
Madeira S.C.
Mann A.K.
Masciari E.
Maulik U.
Milligan G.W.
Mitra S.
Moon T.K.
Moore W.C.
Müllner D.
Nagpal A.
Nasser S.
Neal R.M.
Ng R.T.
Pakhira M.K.
Pal N.R.
Pedregosa F.
Pirim H.
Pitman J.
Prelić A.
Qin Z.S.
Raman S.
Rasmussen C.E.
Rezaee B.
Rezaee M.R.
Ruspini E.H.
Saha S.
Saha S.
Saha S.
Sathishkumar K.
Sheikholeslami G.
Sheng Q.
Sirinukunwattana K.
Sokal R.R.
Sun J.
Talaat A.M.
Tamayo P.
Tanay A.
Tang C.
Thalamuthu A.
Tibshirani R.
Wan M.
Wang L.
Wang W.
Williams G.
Wu J.
Wu K.L.
Wu S.
Xie X.L.
Xu R.
Xu Y.
Yu H.
Zhang D.
Zhang T.
Zhang Y.
Zhang Z.Y.
Zhao L.
Zhong C.
Zitnik M.
Řehůřek R.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2016
Field of study

Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

Covenant University Repository

Directory of Open Access Journals

Ghent University Academic Bibliography

A genomic analysis and transcriptomic atlas of gene expression in Psoroptes ovis reveals feeding- and stage-specific patterns of allergen expression

Author: Bartley Kathryn
Burgess Stewart TG
Down Rachel E
Dunn Jackie
Marr Edward J
Nisbet Alasdair J
Nunn Francesca G
Prickett Jessica C
Rombauts Stephane
Van de Peer Yves
Van Leeuwen Thomas
Weaver Robert J
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Background: Psoroptic mange, caused by infestation with the ectoparasitic mite, Psoroptes ovis, is highly contagious, resulting in intense pruritus and represents a major welfare and economic concern for the livestock industry Worldwide. Control relies on injectable endectocides and organophosphate dips, but concerns over residues, environmental contamination, and the development of resistance threaten the sustainability of this approach, highlighting interest in alternative control methods. However, development of vaccines and identification of chemotherapeutic targets is hampered by the lack of P. ovis transcriptomic and genomic resources. Results: Building on the recent publication of the P. ovis draft genome, here we present a genomic analysis and transcriptomic atlas of gene expression in P. ovis revealing feeding- and stage-specific patterns of gene expression, including novel multigene families and allergens. Network-based clustering revealed 14 gene clusters demonstrating either single- or multi-stage specific gene expression patterns, with 3075 female-specific, 890 male-specific and 112, 217 and 526 transcripts showing larval, protonymph and tritonymph specific-expression, respectively. Detailed analysis of P. ovis allergens revealed stage-specific patterns of allergen gene expression, many of which were also enriched in "fed" mites and tritonymphs, highlighting an important feeding-related allergenicity in this developmental stage. Pair-wise analysis of differential expression between life-cycle stages identified patterns of sex-biased gene expression and also identified novel P. ovis multigene families including known allergens and novel genes with high levels of stage-specific expression. Conclusions: The genomic and transcriptomic atlas described here represents a unique resource for the acarid-research community, whilst the OrcAE platform makes this freely available, facilitating further community-led curation of the draft P. ovis genome

A semi-parametric Bayesian model for unsupervised differential co-expression analysis

Author: Freudenberg Johannes M
Medvedovic Mario
Sivaganesan Siva
Wagner Michael
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Differential co-expression analysis is an emerging strategy for characterizing disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples, such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples. Results We developed a novel probabilistic framework for jointly uncovering contexts (i.e. groups of samples) with specific co-expression patterns, and groups of genes with different co-expression patterns across such contexts. In contrast to current clustering and bi-clustering procedures, the implicit similarity measure in this model used for grouping biological samples is based on the clustering structure of genes within each sample and not on traditional measures of gene expression level similarities. Within this framework, biological samples with widely discordant expression patterns can be placed in the same context as long as the co-clustering structure of genes is concordant within these samples. To the best of our knowledge, this is the first method to date for unsupervised differential co-expression analysis in this generality. When applied to the problem of identifying molecular subtypes of breast cancer, our method identified reproducible patterns of differential co-expression across several independent expression datasets. Sample groupings induced by these patterns were highly informative of the disease outcome. Expression patterns of differentially co-expressed genes provided new insights into the complex nature of the ER<it>α </it>regulatory network. Conclusions We demonstrated that the use of the co-clustering structure as the similarity measure in the unsupervised analysis of sample gene expression profiles provides valuable information about expression regulatory networks.</p

Directory of Open Access Journals

Comprehensive evaluation of matrix factorization methods for the analysis of DNA microarray gene expression data

Author: A Hubert
A Hyvarinen
AL Edwards
BS Everitt
D Dueck
DD Lee
DL Davies
EL Lehmann
HC Romesburg
HJ Chung
HJ Chung
Hwa Jeong Seo
J Bezdek
J Dunn
Je-Gun Joung
JP Brunet
Ju Han Kim
KY Yeung
M Halkidi
Mi Hyeon Kim
N Jardine
P Paatero
P Pauca
PJ Rousseeuw
PO Hoyer
PO Hoyer
Q Qi
R Fisher
R Schachtner
R Sharan
R Tibshirani
RR Sokal
S Bicciato
S Jaccard
S Ma
SL Pomeroy
SZ Li
TR Golub
VR Iyer
W Xu
WM Rand
Y Gao
Y Tan
Y Wang
Y Xu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Clustering-based methods on gene-expression analysis have been shown to be useful in biomedical applications such as cancer subtype discovery. Among them, Matrix factorization (MF) is advantageous for clustering gene expression patterns from DNA microarray experiments, as it efficiently reduces the dimension of gene expression data. Although several MF methods have been proposed for clustering gene expression patterns, a systematic evaluation has not been reported yet. Results Here we evaluated the clustering performance of orthogonal and non-orthogonal MFs by a total of nine measurements for performance in four gene expression datasets and one well-known dataset for clustering. Specifically, we employed a non-orthogonal MF algorithm, BSNMF (Bi-directional Sparse Non-negative Matrix Factorization), that applies bi-directional sparseness constraints superimposed on non-negative constraints, comprising a few dominantly co-expressed genes and samples together. Non-orthogonal MFs tended to show better clustering-quality and prediction-accuracy indices than orthogonal MFs as well as a traditional method, K-means. Moreover, BSNMF showed improved performance in these measurements. Non-orthogonal MFs including BSNMF showed also good performance in the functional enrichment test using Gene Ontology terms and biological pathways. Conclusions In conclusion, the clustering performance of orthogonal and non-orthogonal MFs was appropriately evaluated for clustering microarray data by comprehensive measurements. This study showed that non-orthogonal MFs have better performance than orthogonal MFs and <it>K</it>-means for clustering microarray data.</p

Directory of Open Access Journals