Skip to main content
Article thumbnail
Location of Repository

Modulated Modularity Clustering as an Exploratory Tool for Functional Genomic Inference

By Eric A. Stone and Julien F. Ayroles

Abstract

In recent years, the advent of high-throughput assays, coupled with their diminishing cost, has facilitated a systems approach to biology. As a consequence, massive amounts of data are currently being generated, requiring efficient methodology aimed at the reduction of scale. Whole-genome transcriptional profiling is a standard component of systems-level analyses, and to reduce scale and improve inference clustering genes is common. Since clustering is often the first step toward generating hypotheses, cluster quality is critical. Conversely, because the validation of cluster-driven hypotheses is indirect, it is critical that quality clusters not be obtained by subjective means. In this paper, we present a new objective-based clustering method and demonstrate that it yields high-quality results. Our method, modulated modularity clustering (MMC), seeks community structure in graphical data. MMC modulates the connection strengths of edges in a weighted graph to maximize an objective function (called modularity) that quantifies community structure. The result of this maximization is a clustering through which tightly-connected groups of vertices emerge. Our application is to systems genetics, and we quantitatively compare MMC both to the hierarchical clustering method most commonly employed and to three popular spectral clustering approaches. We further validate MMC through analyses of human and Drosophila melanogaster expression data, demonstrating that the clusters we obtain are biologically meaningful. We show MMC to be effective and suitable to applications of large scale. In light of these features, we advocate MMC as a standard tool for exploration and hypothesis generation

Topics: Research Article
Publisher: Public Library of Science
OAI identifier: oai:pubmedcentral.nih.gov:2673040
Provided by: PubMed Central
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://www.pubmedcentral.nih.g... (external link)
  • Suggested articles

    Citations

    1. (2004). A gene atlas of the mouse and human protein-encoding transcriptomes.
    2. (2005). A general framework for weighted gene coexpression network analysis.
    3. (1987). A graphical aid to the interpretation and validation of cluster analysis.
    4. (2004). A probabilistic functional network of yeast genes.
    5. (2007). A Tutorial on Spectral Clustering.
    6. (2005). Arenas A
    7. (2001). Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments.
    8. (2000). Centripetal cholesterol flow from the extrahepatic organs through the liver is normal in mice with mutated NiemannPick type C protein (NPC1).
    9. (2003). CLICK and EXPANDER: a system for clustering and visualizing gene expression data.
    10. (1998). Cluster analysis and display of genome-wide expression patterns.
    11. (2006). Community detection in complex networks using genetic algorithm.
    12. (2002). Community structure in social and biological networks.
    13. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks.
    14. (2003). DAVID: Database for Annotation, Visualization, and Integrated Discovery.
    15. (2007). Determining the number of clusters using the weighted gap statistic.
    16. (2000). Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks.
    17. (2007). Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes.
    18. (2003). Effect of plasma lipoproteins on natural killer cell activity in the elderly population.
    19. (2006). Elucidating the murine brain transcriptional network in a segregating mouse population to identify core functional modules for obesity and diabetes.
    20. (2001). Estimating the number of clusters in a data set via the gap statistic.
    21. (2004). Finding and evaluating community structure in networks.
    22. (2008). FlyBase : a database for the Drosophila research community. Methods Mol Biol 420: 45–59. Modulated Modularity Clustering PLoS Genetics |
    23. (2006). From genetical genomics to systems genetics: potential applications in quantitative genomics and animal breeding.
    24. (2005). Functional cartography of complex metabolic networks.
    25. (2007). Gene network interconnectedness and the generalized topological overlap measure.
    26. (1996). Genetic and environmental contributions to cardiovascular risk factors in Mexican Americans. The San Antonio Family Heart Study.
    27. (2008). Genetics of gene expression and its effect on disease.
    28. (2002). Hierarchical organization of modularity in metabolic networks.
    29. (2007). Identification and evaluation of functional modules in gene co-expression networks. In: Ideker T, Bafna V, eds
    30. (2006). Identification and evaluation of weak community structures in networks.
    31. (2006). Integrating genetic and network analysis to characterize genes related to mouse weight.
    32. Kempe D Modularity-Maximizing Graph Communities via Mathematical Programming.
    33. (1998). Large-scale temporal gene expression mapping of central nervous system development.
    34. (2006). Modularity and community structure in networks.
    35. (2007). Natural killer cells: from CD3(2)NKp46(+) to post-genomics meta-analyses.
    36. (2000). Normalized Cuts and Image Segmentation.
    37. (2003). NPC1 and NPC2 regulate cellular cholesterol homeostasis through generation of low density lipoprotein cholesterol-derived oxysterols.
    38. (2002). On spectral clustering: Analysis and an algorithm. In:
    39. (2005). Perforin triggers a plasma membrane-repair response that facilitates CTL induction of apoptosis.
    40. (2009). Systems genetics of complex traits in Drosophila melanogaster.
    41. (2007). Understanding network concepts in modules.
    42. (2007). Using FlyAtlas to identify better Drosophila melanogaster models of human disease.
    43. (2008). Variations in DNA elucidate molecular networks that cause disease.
    44. (2007). Weighted gene coexpression network analysis strategies applied to mouse weight.

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.