Search CORE

Brunel University Research Archive

A comparison of four clustering methods for brain expression microarray data

Author: A Prelić
A Riva
A Thalamuthu
AI Su
Alexander L Richards
BM Bolstad
BW Higgs
C Stansberg
CC Liu
D Dembélé
D Thain
DA Hosack
DB Allison
GC Tseng
J Ihmels
JA Hartigan
JD Cahoy
Lesley Jones
M Dai
M Dettling
M Kloster
MC O'Donovan
Michael C O'Donovan
Michael J Owen
MJL de Hoon
NR Garge
P Khatri
Peter Holmans
R Edgar
S Bergmann
SV Kyosseva
T Barrett
T Beißbarth
T Walsh
Z Fang
ZS Qin
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expression profiles can simplify the data, and potentially provides an important source of biological inference, but these methods have not been tested systematically on datasets from complex human tissues. In this paper, four clustering methods, CRC, k-means, ISA and memISA, are used upon three brain expression datasets. The results are compared on speed, gene coverage and GO enrichment. The effects of combining the clusters produced by each method are also assessed. Results k-means outperforms the other methods, with 100% gene coverage and GO enrichments only slightly exceeded by memISA and ISA. Those two methods produce greater GO enrichments on the datasets used, but at the cost of much lower gene coverage, fewer clusters produced, and speed. The clusters they find are largely different to those produced by k-means. Combining clusters produced by k-means and memISA or ISA leads to increased GO enrichment and number of clusters produced (compared to k-means alone), without negatively impacting gene coverage. memISA can also find potentially disease-related clusters. In two independent dorsolateral prefrontal cortex datasets, it finds three overlapping clusters that are either enriched for genes associated with schizophrenia, genes differentially expressed in schizophrenia, or both. Two of these clusters are enriched for genes of the MAP kinase pathway, suggesting a possible role for this pathway in the aetiology of schizophrenia. Conclusion Considered alone, k-means clustering is the most effective of the four methods on typical microarray brain expression datasets. However, memISA and ISA can add extra high-quality clusters to the set produced by k-means, so combining these three methods is the method of choice

Online Research @ Cardiff

Public Library of Science (PLOS)

A Seriation Approach for Visualization-Driven Discovery of Co-Expression Patterns in Serial Analysis of Gene Expression (SAGE) Data

Author: A Bateman
A Prelić
A Thalamuthu
AB Firulli
AS Siddiqui
BA Westerman
BG Hoffman
Brad G. Hoffman
Cheryl D. Helgason
DA Hosack
DG Kendall
E Marsich
F Al-Shahrour
G Caraux
J Ernst
JF Habener
JL Dennis
K Puolämki
L Cai
M Schena
Marco A. Marra
MB Eisen
N Robertson
OL Griffith
Olena Morozova
Oliver Hofmann
R Clarke
R Gasa
RR Sokal
S Audic
S Blackshaw
S Brenner
SK Kim
T Kitamura
TJ Hubbard
VE Velculescu
Vyacheslav Morozov
W Zhang
WMF Petrie
WS Robinson
Z Bar-Joseph
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Background: Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives. Principal Findings: Here we explore the use of seriation, a statistical approach for ordering sets of objects based on their similarity, for large-scale expression pattern discovery in SAGE data. For this specific task we implement a seriation heuristic we term ‘progressive construction of contigs ’ that constructs local chains of related elements by sequentially rearranging margins of the correlation matrix. We apply the heuristic to the analysis of simulated and experimental SAGE data and compare our results to those obtained with a clustering algorithm developed specifically for SAGE data. We show using simulations that the performance of seriation compares favorably to that of the clustering algorithm on noisy SAGE data. Conclusions: We explore the use of a seriation approach for visualization-based pattern discovery in SAGE data. Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed specifically for SAGE data. Our results suggest that seriation is a usefu

CiteSeerX

Relating gene expression data on two-component systems to functional annotations in Escherichia coli

Abstract Background Obtaining physiological insights from microarray experiments requires computational techniques that relate gene expression data to functional information. Traditionally, this has been done in two consecutive steps. The first step identifies important genes through clustering or statistical techniques, while the second step assigns biological functions to the identified groups. Recently, techniques have been developed that identify such relationships in a single step. Results We have developed an algorithm that relates patterns of gene expression in a set of microarray experiments to functional groups in one step. Our only assumption is that patterns co-occur frequently. The effectiveness of the algorithm is demonstrated as part of a study of regulation by two-component systems in <it>Escherichia coli</it>. The significance of the relationships between expression data and functional annotations is evaluated based on density histograms that are constructed using product similarity among expression vectors. We present a biological analysis of three of the resulting functional groups of proteins, develop hypotheses for further biological studies, and test one of these hypotheses experimentally. A comparison with other algorithms and a different data set is presented. Conclusion Our new algorithm is able to find interesting and biologically meaningful relationships, not found by other algorithms, in previously analyzed data sets. Scaling of the algorithm to large data sets can be achieved based on a theoretical model.</p

Extracting expression modules from perturbational gene expression compendia

Author: A Joshi
A Prelić
A Tanay
A Tanay
AL Barabási
AW Rives
C Stark
CE Horak
CT Harbison
D Pe'er
DJ Reiss
Dk Lee
E Ragni
E Ravasz
E Segal
E Segal
G Getz
G Lesage
GD Bader
GK Smyth
H Kitano
I Laloux
I Laloux
J Ihmels
J Ihmels
J Supper
JA Ubersax
JDJ Han
L Lazzeroni
LA Amaral
LF Wu
LH Hartwell
M Ashburner
M Gaisne
M Halkidi
M Schmid
Martin Kuiper
MB Eisen
MG Walker
MZ Bao
N Bolshakova
N Metropolis
P D'haeseleer
Patrick Van Dijck
Q Sheng
R Albert
R Shamir
R Tanaka
S Barkow
S Bergmann
S Bergmann
S Erdman
S Hohmann
S Kirkpatrick
S Maere
SC Madeira
SK Kim
Steven Maere
T Ideker
T Michoel
TR Hughes
W Zhang
X Cui
Y Benjamini
Y Cheng
Y Kluger
Z Bar-Joseph
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Compendia of gene expression profiles under chemical and genetic perturbations constitute an invaluable resource from a systems biology perspective. However, the perturbational nature of such data imposes specific challenges on the computational methods used to analyze them. In particular, traditional clustering algorithms have difficulties in handling one of the prominent features of perturbational compendia, namely partial coexpression relationships between genes. Biclustering methods on the other hand are specifically designed to capture such partial coexpression patterns, but they show a variety of other drawbacks. For instance, some biclustering methods are less suited to identify overlapping biclusters, while others generate highly redundant biclusters. Also, none of the existing biclustering tools takes advantage of the staple of perturbational expression data analysis: the identification of differentially expressed genes. Results We introduce a novel method, called ENIGMA, that addresses some of these issues. ENIGMA leverages differential expression analysis results to extract expression modules from perturbational gene expression data. The core parameters of the ENIGMA clustering procedure are automatically optimized to reduce the redundancy between modules. In contrast to the biclusters produced by most other methods, ENIGMA modules may show internal substructure, i.e. subsets of genes with distinct but significantly related expression patterns. The grouping of these (often functionally) related patterns in one module greatly aids in the biological interpretation of the data. We show that ENIGMA outperforms other methods on artificial datasets, using a quality criterion that, unlike other criteria, can be used for algorithms that generate overlapping clusters and that can be modified to take redundancy between clusters into account. Finally, we apply ENIGMA to the Rosetta compendium of expression profiles for <it>Saccharomyces cerevisiae </it>and we analyze one pheromone response-related module in more detail, demonstrating the potential of ENIGMA to generate detailed predictions. Conclusion It is increasingly recognized that perturbational expression compendia are essential to identify the gene networks underlying cellular function, and efforts to build these for different organisms are currently underway. We show that ENIGMA constitutes a valuable addition to the repertoire of methods to analyze such data.</p

Ghent University Academic Bibliography

BicSPAM: flexible biclustering using sequential patterns

Author: A Ben-Dor
A Califano
A Patrikainen
A Prelić
A Serin
A Tanay
AA Alizadeh
AR Donders
C Creighton
C Ding
C Tang
D Bozdağ
D Martin
DS Hochbaum
F Zhu
G Atluri
G Bebek
G Getz
G Pandey
GF Berriz
H Choi
H Toivonen
H Wang
J Bellay
J Han
J Ihmels
J Liu
J Liu
J Pei
J Wang
J Yang
JA Hartigan
K Sim
K Yip
L Lazzeroni
M Charrad
M de Souto
M Steinbach
MA Mahfouz
MJ Zaki
NR Mabroukeh
O Troyanskaya
P Carmona-Saez
P Fournier-Viger
Q Fang
Q Sheng
R Henriques
R Henriques
R Martinez
Rui Henriques
S Barkow
S Hochreiter
S Madeira
S Tavazoie
Sara C Madeira
SC Madeira
SS Young
T Calders
T Hellem
TR Golub
U Alon
X Yan
Y Huang
Y Okada
Y Okada
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Covenant University Repository

Clustering Algorithms: Their Application to Gene Expression Data

Author: Agrawal R.
Alizadeh A.A.
Bandyopadhyay S.
Bandyopadhyay S.
Bezdek J.C.
Bezdek J.C.
Bezdek† J.C.
Bhargavi M.S.
Blatt M.
Bochkov Y.A.
Brunet J.P.
Bryan K.
Buitinck L.
Bunnik E.M.
Caliński T.
Chandrasekhar T.
Cheng Y.
Costa I.G.
Cover T.M.
D'haeseleer P.
Dave R.N.
Davies D.L.
De Morsier F.
Dempster A.P.
Dharmarajan A.
Dhillon I.S.
Divina F.
Do C.B.
Domany E.
Du Z.
Dunn† J.C.
Edla D.R.
Eisen M.B.
Ferguson T.S.
Frey B.J.
Fu L.
Fukuyama Y.
Galluccio L.
Gath I.
Getz G.
Gordon G.J.
Gu J.
Guha S.
Handhayani T.
Handl J.
Hatamlou A.
Heard N.A.
Heyer L.J.
Hinneburg A.
Hinneburg A.
Hu X.
Hubert L.J.
Jain A.K.
Jiang D.
Jiang H.
Joopudi S.
Kao Y.T.
Karmilasari S.W.
Karypis G.
Kaufman L.
Kerr G.
Kluger Y.
Kohonen T.
Kohonen T.
Krzanowski W.J.
Leone M.
Lu Y.
Lu Y.
Ma'sum M.A.
MacQueen J.
Madeira S.C.
Mann A.K.
Masciari E.
Maulik U.
Milligan G.W.
Mitra S.
Moon T.K.
Moore W.C.
Müllner D.
Nagpal A.
Nasser S.
Neal R.M.
Ng R.T.
Pakhira M.K.
Pal N.R.
Pedregosa F.
Pirim H.
Pitman J.
Prelić A.
Qin Z.S.
Raman S.
Rasmussen C.E.
Rezaee B.
Rezaee M.R.
Ruspini E.H.
Saha S.
Saha S.
Saha S.
Sathishkumar K.
Sheikholeslami G.
Sheng Q.
Sirinukunwattana K.
Sokal R.R.
Sun J.
Talaat A.M.
Tamayo P.
Tanay A.
Tang C.
Thalamuthu A.
Tibshirani R.
Wan M.
Wang L.
Wang W.
Williams G.
Wu J.
Wu K.L.
Wu S.
Xie X.L.
Xu R.
Xu Y.
Yu H.
Zhang D.
Zhang T.
Zhang Y.
Zhang Z.Y.
Zhao L.
Zhong C.
Zitnik M.
Řehůřek R.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2016
Field of study

Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

BIOINFORMATICS BicAT: A Biclustering Analysis Toolbox

Author: Amela Prelić A
Philip Zimmermann B
Simon Barkow A
Stefan Bleuler A
Zitzler A
Publication venue
Publication date
Field of study

Summary: Besides classical clustering methods such as hierarchical cluste-ring, in recent years biclustering has become a popular approach to analyze biological data sets, e.g., gene expression data. The Biclustering Analysis Toolbox (BicAT) is a software platform for clustering-based data analysis that integrates various biclustering and clustering techniques in terms of a com-mon graphical user interface. Furthermore, BicAT provides different facilities for data preparation, inspection, and postprocessing such as discretization, filtering of biclusters according to specific criteria, or gene pair analysis for constructing gene interconnection graphs. The possibility to use different bic-lustering algorithms inside a single graphical tool allows the user to compare clustering results and choose the algorithm that best fits a specific biological scenario. The toolbox is described in the context of gene expression analy-sis, but is also applicable to other types of data, e.g., data from proteomics or synthetic lethal experiments. Availability: The BicAT toolbox is freely available a

CiteSeerX

eBi – The Algorithm for Exact Biclustering

Author: A. Ben-Dor
A. Prelić
E. Yang
H. Sayoud
J.A. Hartigan
M. Michalak
M. Sikora
S.C. Madeira
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study