Abstract Background The analysis of massive high throughput data via clustering algorithms is very important for elucidating gene functions in biological systems. However, traditional clustering methods have several drawbacks. Biclustering overcomes these limitations by grouping genes and samples simultaneously. It discovers subsets of genes that are co-expressed in certain samples. Recent studies showed that biclustering has a great potential in detecting marker genes that are associated with certain tissues or diseases. Several biclustering algorithms have been proposed. However, it is still a challenge to find biclusters that are significant based on biological validation measures. Besides that, there is a need for a biclustering algorithm that is capable of analyzing very large datasets in reasonable time. Results Here we present a fast biclustering algorithm called DeBi (Differentially Expressed BIclusters). The algorithm is based on a well known data mining approach called frequent itemset. It discovers maximum size homogeneous biclusters in which each gene is strongly associated with a subset of samples. We evaluate the performance of DeBi on a yeast dataset, on synthetic datasets and on human datasets. Conclusions We demonstrate that the DeBi algorithm provides functionally more coherent gene sets compared to standard clustering or biclustering algorithms using biological validation measures such as Gene Ontology term and Transcription Factor Binding Site enrichment. We show that DeBi is a computationally efficient and powerful tool in analyzing large datasets. The method is also applicable on multiple gene expression datasets coming from different labs or platforms.</p

A Ben-Dor

A Prelic

A Rosenwald

A Tanay

AD Basehoar

Akdes Serin

B Andreopoulos

BKH Chia

CT Harbison

D Burdick

DR Ciocca

G Li

GA Grothaus

J Lamb

JA Hartigan

JL Jensen

JN Keller

KD MacIsaac

Martin Vingron

R Shamir

RR Sokal

S Barkow

S Bergmann

S Hochreiter

SC Madeira

TM Murali

TR Hughes

XG Ni

Y Cheng

Y Hoshida

English

PubMed

Abstract Background The analysis of massive high throughput data via clustering algorithms is very important for elucidating gene functions in biological systems. However, traditional clustering methods have several drawbacks. Biclustering overcomes these limitations by grouping genes and samples simultaneously. It discovers subsets of genes that are co-expressed in certain samples. Recent studies showed that biclustering has a great potential in detecting marker genes that are associated with certain tissues or diseases. Several biclustering algorithms have been proposed. However, it is still a challenge to find biclusters that are significant based on biological validation measures. Besides that, there is a need for a biclustering algorithm that is capable of analyzing very large datasets in reasonable time. Results Here we present a fast biclustering algorithm called DeBi (Differentially Expressed BIclusters). The algorithm is based on a well known data mining approach called frequent itemset. It discovers maximum size homogeneous biclusters in which each gene is strongly associated with a subset of samples. We evaluate the performance of DeBi on a yeast dataset, on synthetic datasets and on human datasets. Conclusions We demonstrate that the DeBi algorithm provides functionally more coherent gene sets compared to standard clustering or biclustering algorithms using biological validation measures such as Gene Ontology term and Transcription Factor Binding Site enrichment. We show that DeBi is a computationally efficient and powerful tool in analyzing large datasets. The method is also applicable on multiple gene expression datasets coming from different labs or platforms.</p

Vingron Martin

Serin Akdes

Directory of Open Access Journals

Algorithms for Molecular Biology

DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach

ABSTRACT: BACKGROUND: The analysis of massive high throughput data via clustering algorithms is very important for elucidating gene functions in biological systems. However, traditional clustering methods have several drawbacks. Biclustering overcomes these limitations by grouping genes and samples simultaneously. It discovers subsets of genes that are co-expressed in certain samples. Recent studies showed that biclustering has a great potential in detecting marker genes that are associated with certain tissues or diseases. Several biclustering algorithms have been proposed. However, it is still a challenge to find biclusters that are significant based on biological validation measures. Besides that, there is a need for a biclustering algorithm that is capable of analyzing very large datasets in reasonable time. RESULTS: Here we present a fast biclustering algorithm called DeBi (Differentially Expressed BIclusters). The algorithm is based on a well known data mining approach called frequent itemset. It discovers maximum size homogeneous biclusters in which each gene is strongly associated with a subset of samples. We evaluate the performance of DeBi on a yeast dataset, on synthetic datasets and on human datasets. CONCLUSIONS: We demonstrate that the DeBi algorithm provides functionally more coherent gene sets compared to standard clustering or biclustering algorithms using biological validation measures such as Gene Ontology term and Transcription Factor Binding Site enrichment. We show that DeBi is a computationally efficient and powerful tool in analyzing large datasets. The method is also applicable on multiple gene expression datasets coming from different labs or platforms

Serin, A.

Vingron, M.

MPG.PuRe

Springer - Publisher Connector

Crossref

A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinformatics

BF: Identification and distinct regulation of yeast TATA box-containing genes. Cell

Biclustering algorithms for biological data analysis: a survey.

Calderwood SK: Heat shock proteins in cancer: diagnostic, prognostic, predictive, and treatment implications. Cell Stress Chaperones

CD: A statistical method for evaluating systematic relationships.

Church GM: Biclustering of expression data.

Clevert DA: FABIA: factor analysis for bicluster acquisition. Bioinformatics

Direct Clustering of a Data Matrix.

Discovering local structure in gene expression data: the order-preserving submatrix problem.

Discovering statistically significant biclusters in gene expression data. Bioinformatics 2002, 18(Suppl 1):S136-S144.

Elkon R: EXPANDER-an integrative program suite for microarray data analysis.

Fraenkel E, Young RA: Transcriptional regulatory code of a eukaryotic genome. Nature

Fraenkel E: An improved map of conserved regulatory sites for Saccharomyces cerevisiae.

Iterative signature algorithm for the analysis of large-scale gene expression data.

Kasif S: Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput

Lopez-Guillermo A, et al: The Use of Molecular Profiling to Predict Survival after Chemotherapy for Diffuse Large-B-Cell Lymphoma. New England

MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases.

Markesbery WR: Impaired proteasome function in Alzheimer’s disease.

Mesirov JP: Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets. PLoS ONE

Murali TM: Automatic layout and visualization of biclusters. Algorithms for molecular biology : AMB

SH: Functional discovery via a compendium of expression profiles. Cell

The Connectivity Map: a new tool for biomedical research. Nature reviews Cancer

The ubiquitin-proteasome pathway mediates gelsolin protein downregulation in pancreatic cancer. Mol Med

von Mering C: STRING 8-a global view on proteins and their functional interactions in 630 organisms.

Wong MA: Algorithm AS 136: A k-means clustering algorithm. Applied Statistics

Y: QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucl Acids Res

Zitzler E: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics

Zitzler E: BicAT: a biclustering analysis toolbox. Bioinformatics

file:///data/core-remote/dit/data/Springer-OA/pdf/51d/aHR0cDovL2xpbmsuc3ByaW5nZXIuY29tLzEwLjExODYvMTc0OC03MTg4LTYtMTgucGRm.pdf

DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach

Abstract

Similar works

Full text

Available Versions

Directory of Open Access Journals

MPG.PuRe

Springer - Publisher Connector

Crossref