thesis

A probabilistic model for the evaluation of module extraction algorithms in complex biological networks

Abstract

This thesis presents CiGRAM, a model of complex networks ith known modular structure that is capable of generating realistic graph topology. Much of the recent focus on module detection has been geared towards developing new algorithms capable of detecting biologically significant clusters. However, evaluating clusterings detected by different methods shows that there is little topological agreement or consensus in terms of meta-data despite most methods discovering modules with significant ontology. In this thesis an approach to modelling complex networks with ground-truth modular structure is presented. This approach is capable of generating graphs with heterogeneous degree distributions, high clustering coefficients and assortative degree correlations observed in real data but often ignored in existing benchmarks. Moreover, the model for modular structure concludes that non-modular random graphs are indistinguishable from modules. This model can be tuned to fit many empirical biological and non-biological datasets through fitting target graph summary statistics. The ground-truth structure allows the evaluation of module extraction algorithms in a domain specific context. Furthermore, it was found that degree assortativity appears to negatively impact several module extraction methods such as the popular infomap and modularity maximisation methods. Results presented disagree with other benchmark models highlighting the potential for future research into improving existing methods in ways that challenge assumptions about the detectability of modules

    Similar works