BACKGROUND: The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states. RESULTS: In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (nsNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions. CONCLUSION: The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms

Carazo, Jose M

Carmona-Saez, Pedro

Pascual-Marqui, Roberto D

Pascual-Montano, Alberto

Tirado, F

English

PubMed

18 pages, 1 table, 5 figures, 1 additional file.[Background] The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states.[Results] In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (nsNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions.[Conclusion] The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms.This work has been supported by the Spanish grants GR/SAL/0653/2004, CICYT BFU2004-00217/BMC, GEN2003-20235-c05-05, TIN2005-5619, PR27/05-13964-BSCH and a collaborative grant between the Spanish Research Council and the National Research Council of Canada (CSIC-050402040003). The authors also thank the KEY Foundation for Brain-Mind Research in Zurich for partial economical support of this work. P.C.S. is the recipient of a fellowship from Comunidad de Madrid (CAM). A.P.M. acknowledges the support of the Spanish Ramón y Cajal program.Peer reviewe

Carmona-Sáez, Pedro

Pascual-Marqui, Roberto D.

Tirado, Francisco

Carazo, José M.

Digital.CSIC

BMC Bioinformatics

Biclustering of gene expression data by non-smooth non-negative matrix factorization

Pedro Carmona-Saez

Roberto D Pascual-Marqui

F Tirado

Jose M Carazo

Alberto Pascual-Montano

Springer - Publisher Connector

Carmona-Saez, P

Pascual-Marqui, R D

Carazo, J M

Pascual-Montano, A

ZORA

Biclustering of gene expression data by Non-smooth Non-negative Matrix Factorization

Abstract Background The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states. Results In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (nsNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions. Conclusion The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms.</p

Carazo Jose M

Tirado F

Pascual-Marqui Roberto D

Carmona-Saez Pedro

Pascual-Montano Alberto

Directory of Open Access Journals

© 2006 Carmona-Saez et al; licensee BioMed Central Ltd.
This work has been supported by the Spanish grants GR/SAL/0653/2004, CICYT BFU2004-00217/BMC, GEN2003-20235-c05-05, TIN2005-5619, PR27/05-13964-BSCH and a collaborative grant between the Spanish Research Council and the National Research Council of Canada (CSIC050402040003). The authors also thank the KEY Foundation for Brain-Mind Research in Zurich for partial economical support of this work. P.C.S. is the recipient of a fellowship from Comunidad de Madrid (CAM). A.P.M.
acknowledges the support of the Spanish Ramón y Cajal program.Background: The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states. Results: In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (nsNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions. Conclusion: The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms.Spanish Research CouncilNational Research Council of CanadaKEY Foundation for Brain- Mind Research (Zurich)Comunidad de Madrid (CAM).Spanish Ramón y Cajal programSección Deptal. de Arquitectura de Computadores y Automática (Físicas)Fac. de Ciencias FísicasTRUEpu

Carmona Saez, P.

Pascual Marqui, R. D.

Tirado Fernández, Francisco

Carazo, J. M.

Pascual Montano, Alberto

Docta Complutense

A: The main biological determinants of tumor line taxonomy elucidated by a principal component analysis of microarray data.

Additional File 1 A PDF file containing additional figures mentioned in the main manuscript Click here for file [http://www.biomedcentral.com/content/supplementary/1471-2105-7-78-S1.pdf]BMC Bioinformatics

Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics

Applications of DNA Microarrays in Biology. Annu Rev Biochem

Biclustering algorithms for biological data analysis: a survey.

Biclustering Algorithms: A Survey.

BJ: Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics

Castren E: Analysis of gene expression data using self-organizing maps. FEBS Lett

Cluster analysis and display of genome-wide expression patterns.

Computational neuroscience. Think positive to find parts. Nature

De Moor B: Biclustering microarray data by Gibbs sampling. Bioinformatics

de Rijn M: Molecular characterisation of soft tissue tumours: a gene expression study. Lancet

Discovering statistically significant biclusters in gene expression data. Bioinformatics

Domany E: Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci USA

Eisen MB: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol

Expression Omnibus repository [ h t t p : /

Freije JM: Protein kinase C theta is highly expressed in gastrointestinal stromal tumors but not in other mesenchymal neoplasias. Clin Cancer Res

Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci USA

Genome-wide analysis of gene expression in synovial sarcomas using a cDNA microarray. Cancer Res

GM: Systematic determination of genetic network architecture. Nat Genet

Gullans SR: A compendium of gene expression in normal human tissues. Physiol Genomics

Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA

Holm L: Sensitive pattern discovery with 'fuzzy' alignments of distantly related proteins. Bioinformatics

Identification of the gene altered in Berardinelli-Seip congenital lipodystrophy on chromosome 11q13. Nat Genet

JA: Protein Kinase C theta (PKCtheta) expression and constitutive activation in gastrointestinal stromal tumors (GISTs). Cancer Res

JR: A DNA microarray survey of gene expression in normal human tissues. Genome Biol

LM: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature

LM: Signatures of the immune response. Immunity

Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA

Myklebost O: Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study.

Non-negative Matrix Factorization for visual coding.

Non-negative Matrix Factorization with Sparseness Constraints.

Non-negative sparse coding.

Pascual-Marqui RD: Non-smooth Non-Negative Matrix Factorization (nsNMF).

Pascual-Montano A: Discovering semantic features in the literature: a foundation for building functional associations.

SA: Profiling gene expression using onto-express. Genomics

Schellenberg GD: Cloning, sequencing, and mapping of the human chromosome 14 heat shock protein gene (HSPA2). Genomics

Seung HS: Learning the parts of objects by non-negative matrix factorization. Nature

SH: Functional discovery via a compendium of expression profiles. Cell

Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res

The novel marker, DOG1, is expressed ubiquitously in gastrointestinal stromal tumors irrespective of KIT or PDGFRA mutation status.

Theme discovery from gene lists for identification and viewing of multiple functional groups.

Tidor B: Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Res

Tissue microarray validation of epidermal growth factor receptor and SALL2 in synovial sarcoma with comparison to tumors of similar histology.

TR: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA

When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?

http://doaj.org/search?source=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22id%22%3A%228297f131ab88499ba73bee3f4d5ebc91%22%7D%7D%5D%7D%7D%7D

Biclustering of gene expression data by non-smooth non-negative matrix factorization

Abstract

Similar works

Full text

Available Versions

Digital.CSIC

Springer - Publisher Connector

ZORA

Directory of Open Access Journals

Springer - Publisher Connector

Docta Complutense