Article thumbnail

Using graph theory to analyze biological networks

By Georgios A Pavlopoulos, Maria Secrier, Charalampos N Moschopoulos, Theodoros G Soldatos, Sophia Kossida, Jan Aerts, Reinhard Schneider and Pantelis G Bagos


Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system

Topics: Review
Publisher: BioMed Central
OAI identifier:
Provided by: PubMed Central

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

Suggested articles


  1. (2010). 2: an interaction network database for small molecules and proteins. Nucleic Acids Res
  2. (1959). A note on two problems in connexion with graphs. Numerische Mathematik
  3. (1999). A R: Emergence of scaling in random networks.
  4. (2008). A survey of visualization tools for biological network analysis.
  5. (2002). A-L: Hierarchical organization of modularity in metabolic networks.
  6. (2000). A-L: The large-scale organization of metabolic networks.
  7. (2003). A-P: The connectivity structure, giant strong component and centrality of metabolic networks.
  8. (2005). A: Comparative genomics of centrality and essentiality in three eukaryotic protein-protein interaction networks.
  9. (2008). A: Mining protein networks for synthetic genetic interactions.
  10. (2004). A: Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications.
  11. (2000). al: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature
  12. (2004). al: A map of the interactome network of the metazoan C. elegans. Science
  13. (2004). al: Bioconductor: open software development for computational biology and bioinformatics. Genome biology
  14. (2002). al: Functional organization of the yeast proteome by systematic analysis of protein complexes.
  15. (2007). al: IntAct–open source resource for molecular interaction data. Nucleic Acids Res
  16. (2000). AL: The large-scale organization of metabolic networks.
  17. (1973). Algorithm 457: finding all cliques of an undirected graph.
  18. (1962). Algorithm 97. Comm
  19. (2002). Alon U: Network motifs: simple building blocks of complex networks.
  20. (1999). and the Worldwide Web. 1. Basic Principles. Chem Inf Comput Sci
  21. (2006). Aravind L: Evolutionary dynamics of prokaryotic transcriptional regulatory networks.
  22. (2008). Arena3D: visualization of biological networks in 3D. BMC systems biology
  23. (2002). Assortative Mixing in Networks. Phys Rev Lett
  24. (2006). Bacterial regulatory networks are extremely flexible in evolution. Nucleic Acids Res
  25. (2002). C: Introduction to algorithms.
  26. (2007). Califano A: A context-specific network of protein-DNA and proteinprotein interactions reveals new regulatory motifs in human B cells. Lecture Notes in Bioinformatics (LNCS)
  27. (2003). Casari G: Hierarchical analysis of dependency in metabolic networks. Bioinformatics
  28. (2002). Cesareni G: MINT: a Molecular INTeraction database.
  29. (2002). Characterization of the folding degree of proteins. Bioinformatics
  30. (2009). Clustering algorithms for detecting functional modules in protein interaction networks.
  31. (2002). Comparative assessment of large-scale data sets of protein-protein interactions.
  32. (2001). Computational genetics: Computational analysis of microarray data. Nat Rev Genetics
  33. (2010). Construction and analysis of protein-protein interaction networks. Autom Exp
  34. (1987). Cooper MC: Methodology Review: Clustering Methods. Applied Psychological Measurement
  35. (2002). D: visone - Software for Visual Social Network Analysis.
  36. (2000). Dandekar T: A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks.
  37. (1999). Data Clustering: A Review.
  38. (2009). Deciphering the connectivity structure of biological networks using MixNet.
  39. (1999). Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol
  40. (2001). Development of Chemical Markup Language (CML) as a System for Handling Complex Chemical Content.
  41. (2007). Developmental time windows for spatial growth generate multiple-cluster small-world networks.
  42. (2001). DG: Pattern Classification, ch.10: Unsupervised learning and clustering.
  43. (2008). DroID: the Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions.
  44. (2007). Dueck D: Clustering by passing messages between data points. Science
  45. (2003). E: Scaling laws in the functional content of genomes. Trends Genet
  46. (2006). EP: An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics
  47. (2009). et al: EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res
  48. (2004). et al: Global mapping of the yeast genetic interaction network. Science
  49. (2008). et al: Linear motif atlas for phosphorylation-dependent signaling. Sci Signal
  50. (2006). et al: Proteome survey reveals modularity of the yeast cell machinery. Nature
  51. (2006). et al: RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res
  52. (2009). et al: STRING 8–a global view on proteins and their functional interactions in 630 organisms.
  53. (2001). et al: The protein-protein interaction map of Helicobacter pylori.
  54. (2003). et al: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics
  55. (2003). et al: TM4: a free, open-source system for microarray data management and analysis. BioTechniques
  56. (2002). et al: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science
  57. (2006). Evaluation of clustering algorithms for protein-protein interaction networks.
  58. (2009). Evolution of biomolecular networks - lessons from metabolic and protein interactions. Nature Rev Molecular Cell Biology
  59. (2006). Exploration of biological network centralities with CentiBiN.
  60. (2008). GA: Evolution of metabolic network organization. BMC Syst Bio 2010, 4. Pavlopoulos et al. BioData Mining 2011, 4:10
  61. (2010). Generalized walks-based centrality measures for complex biological networks.
  62. (2007). Goryanin I: The Edinburgh human metabolic network reconstruction and its functional analysis. Mol Syst Biol
  63. (2004). Guan KL: Glutathione-S-transferase-fusion based assays for studying protein-protein interactions. Methods Mol Biol
  64. (2004). Haynor David, Johnson JM: Protein interaction networks. Expert Rev Proteomics
  65. (2007). HE: Betweenness centrality of fractal and nonfractal scale-free model networks and tests on real networks. Phys Rev E
  66. (1967). Hierarchical Clustering Schemes. Psychometrika
  67. (2005). High-Betweenness Proteins in the Yeast Protein Interaction Network.
  68. (2007). Hilgetag CC: Predicting the connectivity of primate cortical networks from topological and spatial node properties. BMC Syst Bio
  69. (2001). Hogue CW: BIND–The Biomolecular Interaction Network Database. Nucleic Acids Res
  70. (2004). HPID: the Human Protein Interaction Database. Bioinformatics
  71. (2003). Hucka M: Systems biology markup language: Level 2 and beyond. Biochemical Society transactions
  72. (2004). I: Protein complex prediction via cost-based clustering. Bioinformatics
  73. (2003). Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res
  74. (2005). Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property.
  75. (1999). Iyengar R: Emergent properties of networks of biological signaling pathways. Science
  76. (2004). JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res
  77. (2007). K: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution
  78. KEGG for representation and analysis of molecular networks involving diseases and drugs.
  79. (2003). Kel AE, Kel-Margoulis OV, et al: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res
  80. (1996). Knuppel R: TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res
  81. (2009). Kossida S: GIBA: a clustering tool for detecting protein complexes.
  82. (2009). Kossida S: jClust: a clustering and visualization toolbox. Bioinformatics
  83. (2009). L MA: Threshold selection in gene co-expression networks using spectral graph theory techniques.
  84. (2009). L: Pinning a complex network through the betweenness centrality strategy.
  85. (2005). Lopez-Bigas N: Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res
  86. (2006). MA: Extracting Gene Networks for Low-Dose Radiation Using Graph Theoretical Algorithms. PLoS Comput Biol
  87. (2005). Medusa: a simple tool for interaction graph analysis. Bioinformatics
  88. (1994). MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers.
  89. (1999). Metabolic pathway analysis: basic concepts and scientific applications in the post-genomic era. Biotechnol Prog
  90. (2004). MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res
  91. (2003). Mixing patterns in networks. Phys Rev
  92. (2006). Miyano S: ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles. Bioinformatics
  93. (1998). Mrvar A: Pajek - Program for Large Network Analysis. Connections
  94. (2008). NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways. Nucleic Acids Res
  95. (2001). Nei M: MEGA2: molecular evolutionary genetics analysis software. Bioinformatics
  96. (2006). Neighbor-joining revealed. Mol Biol Evol
  97. (2011). Network medicine: a network-based approach to human disease. Nature Reviews Genetics
  98. (2002). Network motifs in the transcriptional regulation network of Escherichia coli.
  99. (2006). Network motifs: structure does not determine function.
  100. (2010). Network properties of human disease genes with pleiotropic effects.
  101. (2008). NetworKIN: a resource for exploring cellular phosphorylation networks.
  102. (2008). Networks: teasing out the missing links. Nature
  103. (2001). Oltvai ZN: Lethality and centrality in protein networks.
  104. (1961). On the strength of connectedness of a random graph. Acta Math Acad Sci Hungar
  105. (2002). Ouzounis CA: An efficient algorithm for large-scale detection of protein families.
  106. (2009). Palsson B: Reconstruction of biochemical networks in microorganisms.
  107. (2000). Palsson BO: Assessment of the metabolic capabilities of Haemophilus influenzae Rd through a genome-scale pathway analysis.
  108. (2004). Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Res
  109. (2004). PF: CellML: its future, present and past. Progress in biophysics and molecular biology
  110. (2002). Phage display: practicalities and prospects. Plant Mol Biol
  111. (2007). PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol
  112. (2008). Przytycka TM: Why do hubs in the yeast protein interaction network tend to be essential: re-examining the connection between the network topology and essentiality. PLoS Comput Biol
  113. (2010). R: A reference guide for tree analysis and visualization. BioData Min
  114. (2007). R: Graphs in molecular biology.
  115. (2008). Radev DR: Identifying gene-disease associations using centrality on a literature mined geneinteraction network. Bioinformatics
  116. RDF vocabulary description language 1.0: RDF
  117. (1999). Resource Description Framework (RDF) Model and Syntax Specification. The World Wide Web Consortium (W3C) MIT, INRIA
  118. S F: Comparison of Centralities for Biological Networks.
  119. (2008). S ML: Network hubs buffer environmental variation in Saccharomyces cerevisiae. PLoS Biol
  120. (1998). S SH: Collective dynamics of ‘small-world’ networks.
  121. (2000). S: Graph Clustering by Flow Simulation.
  122. (2001). Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome.
  123. (2006). Saqi MA: Spectral clustering of protein sequences.
  124. (2005). Scale-free networks in cell biology.
  125. (2008). Schreiber F: Analysis of Biological Networks.
  126. (2001). Seraphin B: The tandem affinity purification (TAP) method: a general procedure of protein complex purification. Methods
  127. (2002). Shneiderman B: Interactively Exploring Hierarchical Clustering Results. Computer
  128. (2010). Signalling ballet in space and time. Nature Rev Molecular Cell Biology
  129. (2004). SJ: Incremental genetic K-means algorithm and its application in gene expression data analysis.
  130. SK: Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics 2010, 11(Suppl 1):S3.
  131. (1957). Sokal RR: A Quantitative Approach to a Problem in Classification. Evolution
  132. (1973). Sokal RR: Unweighted Pair Group Method with Arithmetic Mean. Numerical Taxonomy San Francisco: Freeman;
  133. Some Methods for classification and Analysis of Multivariate Observations.
  134. (2006). Stumpflen V: MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res
  135. (2008). Survival of the sparsest: robust gene networks are parsimonious. Mol Syst Biol
  136. (2006). The comprehensive updated regulatory network of Escherichia coli K-12. BMC Bioinformatics
  137. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol
  138. (2009). The powerful law of the power law and other myths in network biology. Mol Biosyst
  139. (2005). The transcriptional landscape of the mammalian genome. Science
  140. (1999). The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data. Nucleic Acids Res
  141. (2006). Tikuisis AP, et al: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature
  142. (2004). TJ: Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins.
  143. (2005). TO: Protein microarrays: applications and future challenges. Curr Opin Drug Discov Devel
  144. (2002). Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet
  145. (2008). Two classes of bipartite networks: nested biological and social systems. Phys Rev
  146. (2006). Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res
  147. (1978). U-Statistic Hierarchical Clustering. Psychometrika
  148. (2001). Uriarte E: Recent advances on the role of topological indices in drug discovery research. Curr Med Chem
  149. (2005). VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res
  150. (2008). Visual analysis of bipartite biological networks.
  151. (2003). Vitols E, et al: A protein interaction map of Drosophila melanogaster. Science
  152. (2004). von Mering C, et al: The HUPO PSI’s molecular interaction format - a community standard for the representation of protein interaction data. Nat Biotechnol
  153. (2009). Voy BH: Comparison of threshold selection methods for microarray gene co-expression matrices.
  154. (2006). Westhead DR: Arabidopsis Co-expression Tool (ACT): web server tools for microarray-based gene expression analysis.
  155. (2009). Westhead DR: metaTIGER: a metabolic evolution resource. Nucleic Acids Res
  156. (2003). Wingender E: TRANSPATH: an integrated database on signal transduction and a tool for array analysis. Nucleic Acids Res
  157. (2009). Wodak SJ: Markov clustering versus affinity propagation for the partitioning of protein interaction graphs.
  158. (2004). Working group: BioPAX-biological pathways exchange language. Version 10 Documentation
  159. (2007). Z IB: MiST: a microbial signal transduction database. Nucleic Acids Res
  160. (2003). Zeng AP: Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics
  161. (2009). Zhang X: MIClique: An Algorithm to Identify Differentially Coexpressed Disease Gene Subset from Microarray Data.