2,513 research outputs found

    Multi-scale Modularity in Complex Networks

    Full text link
    We focus on the detection of communities in multi-scale networks, namely networks made of different levels of organization and in which modules exist at different scales. It is first shown that methods based on modularity are not appropriate to uncover modules in empirical networks, mainly because modularity optimization has an intrinsic bias towards partitions having a characteristic number of modules which might not be compatible with the modular organization of the system. We argue for the use of more flexible quality functions incorporating a resolution parameter that allows us to reveal the natural scales of the system. Different types of multi-resolution quality functions are described and unified by looking at the partitioning problem from a dynamical viewpoint. Finally, significant values of the resolution parameter are selected by using complementary measures of robustness of the uncovered partitions. The methods are illustrated on a benchmark and an empirical network.Comment: 8 pages, 3 figure

    Modularity and the spread of perturbations in complex dynamical systems

    Get PDF
    We propose a method to decompose dynamical systems based on the idea that modules constrain the spread of perturbations. We find partitions of system variables that maximize 'perturbation modularity', defined as the autocovariance of coarse-grained perturbed trajectories. The measure effectively separates the fast intramodular from the slow intermodular dynamics of perturbation spreading (in this respect, it is a generalization of the 'Markov stability' method of network community detection). Our approach captures variation of modular organization across different system states, time scales, and in response to different kinds of perturbations: aspects of modularity which are all relevant to real-world dynamical systems. It offers a principled alternative to detecting communities in networks of statistical dependencies between system variables (e.g., 'relevance networks' or 'functional networks'). Using coupled logistic maps, we demonstrate that the method uncovers hierarchical modular organization planted in a system's coupling matrix. Additionally, in homogeneously-coupled map lattices, it identifies the presence of self-organized modularity that depends on the initial state, dynamical parameters, and type of perturbations. Our approach offers a powerful tool for exploring the modular organization of complex dynamical systems

    Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems

    Full text link
    Unveiling the community structure of networks is a powerful methodology to comprehend interconnected systems across the social and natural sciences. To identify different types of functional modules in interaction data aggregated in a single network layer, researchers have developed many powerful methods. For example, flow-based methods have proven useful for identifying modular dynamics in weighted and directed networks that capture constraints on flow in the systems they represent. However, many networked systems consist of agents or components that exhibit multiple layers of interactions. Inevitably, representing this intricate network of networks as a single aggregated network leads to information loss and may obscure the actual organization. Here we propose a method based on compression of network flows that can identify modular flows in non-aggregated multilayer networks. Our numerical experiments on synthetic networks show that the method can accurately identify modules that cannot be identified in aggregated networks or by analyzing the layers separately. We capitalize on our findings and reveal the community structure of two multilayer collaboration networks: scientists affiliated to the Pierre Auger Observatory and scientists publishing works on networks on the arXiv. Compared to conventional aggregated methods, the multilayer method reveals smaller modules with more overlap that better capture the actual organization

    Encoding dynamics for multiscale community detection: Markov time sweeping for the Map equation

    Get PDF
    The detection of community structure in networks is intimately related to finding a concise description of the network in terms of its modules. This notion has been recently exploited by the Map equation formalism (M. Rosvall and C.T. Bergstrom, PNAS, 105(4), pp.1118--1123, 2008) through an information-theoretic description of the process of coding inter- and intra-community transitions of a random walker in the network at stationarity. However, a thorough study of the relationship between the full Markov dynamics and the coding mechanism is still lacking. We show here that the original Map coding scheme, which is both block-averaged and one-step, neglects the internal structure of the communities and introduces an upper scale, the `field-of-view' limit, in the communities it can detect. As a consequence, Map is well tuned to detect clique-like communities but can lead to undesirable overpartitioning when communities are far from clique-like. We show that a signature of this behavior is a large compression gap: the Map description length is far from its ideal limit. To address this issue, we propose a simple dynamic approach that introduces time explicitly into the Map coding through the analysis of the weighted adjacency matrix of the time-dependent multistep transition matrix of the Markov process. The resulting Markov time sweeping induces a dynamical zooming across scales that can reveal (potentially multiscale) community structure above the field-of-view limit, with the relevant partitions indicated by a small compression gap.Comment: 10 pages, 6 figure

    Probabilistic Random Walk Models for Comparative Network Analysis

    Get PDF
    Graph-based systems and data analysis methods have become critical tools in many fields as they can provide an intuitive way of representing and analyzing interactions between variables. Due to the advances in measurement techniques, a massive amount of labeled data that can be represented as nodes on a graph (or network) have been archived in databases. Additionally, novel data without label information have been gradually generated and archived. Labeling and identifying characteristics of novel data is an important first step in utilizing the valuable data in an effective and meaningful way. Comparative network analysis is an effective computational means to identify and predict the properties of the unlabeled data by comparing the similarities and differences between well-studied and less-studied networks. Comparative network analysis aims to identify the matching nodes and conserved subnetworks across multiple networks to enable a prediction of the properties of the nodes in the less-studied networks based on the properties of the matching nodes in the well-studied networks (i.e., transferring knowledge between networks). One of the fundamental and important questions in comparative network analysis is how to accurately estimate node-to-node correspondence as it can be a critical clue in analyzing the similarities and differences between networks. Node correspondence is a comprehensive similarity that integrates various types of similarity measurements in a balanced manner. However, there are several challenges in accurately estimating the node correspondence for large-scale networks. First, the scale of the networks is a critical issue. As networks generally include a large number of nodes, we have to examine an extremely large space and it can pose a computational challenge due to the combinatorial nature of the problem. Furthermore, although there are matching nodes and conserved subnetworks in different networks, structural variations such as node insertions and deletions make it difficult to integrate a topological similarity. In this dissertation, novel probabilistic random walk models are proposed to accurately estimate node-to-node correspondence between networks. First, we propose a context-sensitive random walk (CSRW) model. In the CSRW model, the random walker analyzes the context of the current position of the random walker and it can switch the random movement to either a simultaneous walk on both networks or an individual walk on one of the networks. The context-sensitive nature of the random walker enables the method to effectively integrate different types of similarities by dealing with structural variations. Second, we propose the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) model. In the CUFID model, we construct an integrated network by inserting pseudo edges between potential matching nodes in different networks. Then, we design the random walk protocol to transit more frequently between potential matching nodes as their node similarity increases and they have more matching neighboring nodes. We apply the proposed random walk models to comparative network analysis problems: global network alignment and network querying. Through extensive performance evaluations, we demonstrate that the proposed random walk models can accurately estimate node correspondence and these can lead to improved and reliable network comparison results

    Probabilistic Random Walk Models for Comparative Network Analysis

    Get PDF
    Graph-based systems and data analysis methods have become critical tools in many fields as they can provide an intuitive way of representing and analyzing interactions between variables. Due to the advances in measurement techniques, a massive amount of labeled data that can be represented as nodes on a graph (or network) have been archived in databases. Additionally, novel data without label information have been gradually generated and archived. Labeling and identifying characteristics of novel data is an important first step in utilizing the valuable data in an effective and meaningful way. Comparative network analysis is an effective computational means to identify and predict the properties of the unlabeled data by comparing the similarities and differences between well-studied and less-studied networks. Comparative network analysis aims to identify the matching nodes and conserved subnetworks across multiple networks to enable a prediction of the properties of the nodes in the less-studied networks based on the properties of the matching nodes in the well-studied networks (i.e., transferring knowledge between networks). One of the fundamental and important questions in comparative network analysis is how to accurately estimate node-to-node correspondence as it can be a critical clue in analyzing the similarities and differences between networks. Node correspondence is a comprehensive similarity that integrates various types of similarity measurements in a balanced manner. However, there are several challenges in accurately estimating the node correspondence for large-scale networks. First, the scale of the networks is a critical issue. As networks generally include a large number of nodes, we have to examine an extremely large space and it can pose a computational challenge due to the combinatorial nature of the problem. Furthermore, although there are matching nodes and conserved subnetworks in different networks, structural variations such as node insertions and deletions make it difficult to integrate a topological similarity. In this dissertation, novel probabilistic random walk models are proposed to accurately estimate node-to-node correspondence between networks. First, we propose a context-sensitive random walk (CSRW) model. In the CSRW model, the random walker analyzes the context of the current position of the random walker and it can switch the random movement to either a simultaneous walk on both networks or an individual walk on one of the networks. The context-sensitive nature of the random walker enables the method to effectively integrate different types of similarities by dealing with structural variations. Second, we propose the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) model. In the CUFID model, we construct an integrated network by inserting pseudo edges between potential matching nodes in different networks. Then, we design the random walk protocol to transit more frequently between potential matching nodes as their node similarity increases and they have more matching neighboring nodes. We apply the proposed random walk models to comparative network analysis problems: global network alignment and network querying. Through extensive performance evaluations, we demonstrate that the proposed random walk models can accurately estimate node correspondence and these can lead to improved and reliable network comparison results

    A Study Of Computational Problems In Computational Biology And Social Networks: Cancer Informatics And Cascade Modelling

    Get PDF
    It is undoubtedly that everything in this world is related and nothing independently exists. Entities interact together to form groups, resulting in many complex networks. Examples involve functional regulation models of proteins in biology, communities of people within social network. Since complex networks are ubiquitous in daily life, network learning had been gaining momentum in a variety of discipline like computer science, economics and biology. This call for new technique in exploring the structure as well as the interactions of network since it provides insight in understanding how the network works and deepening our knowledge of the subject in hand. For example, uncovering proteins modules helps us understand what causes lead to certain disease and how protein co-regulate each others. Therefore, my dissertation takes on problems in computational biology and social network: cancer informatics and cascade model-ling. In cancer informatics, identifying specific genes that cause cancer (driver genes) is crucial in cancer research. The more drivers identified, the more options to treat the cancer with a drug to act on that gene. However, identifying driver gene is not easy. Cancer cells are undergoing rapid mutation and are compromised in regards to the body\u27s normally DNA repair mechanisms. I employed Markov chain, Bayesian network and graphical model to identify cancer drivers. I utilize heterogeneous sources of information to discover cancer drivers and unlocking the mechanism behind cancer. Above all, I encode various pieces of biological information to form a multi-graph and trigger various Markov chains in it and rank the genes in the aftermath. We also leverage probabilistic mixed graphical model to learn the complex and uncertain relationships among various bio-medical data. On the other hand, diffusion of information over the network had drawn up great interest in research community. For example, epidemiologists observe that a person becomes ill but they can neither determine who infected the patient nor the infection rate of each individual. Therefore, it is critical to decipher the mechanism underlying the process since it validates efforts for preventing from virus infections. We come up with a new modeling to model cascade data in three different scenario

    Assessment of network module identification across complex diseases

    Full text link
    Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology

    Module Identification for Biological Networks

    Get PDF
    Advances in high-throughput techniques have enabled researchers to produce large-scale data on molecular interactions. Systematic analysis of these large-scale interactome datasets based on their graph representations has the potential to yield a better understanding of the functional organization of the corresponding biological systems. One way to chart out the underlying cellular functional organization is to identify functional modules in these biological networks. However, there are several challenges of module identification for biological networks. First, different from social and computer networks, molecules work together with different interaction patterns; groups of molecules working together may have different sizes. Second, the degrees of nodes in biological networks obey the power-law distribution, which indicates that there exist many nodes with very low degrees and few nodes with high degrees. Third, molecular interaction data contain a large number of false positives and false negatives. In this dissertation, we propose computational algorithms to overcome those challenges. To identify functional modules based on interaction patterns, we develop efficient algorithms based on the concept of block modeling. We propose a subgradient Frank-Wolfe algorithm with path generation method to identify functional modules and recognize the functional organization of biological networks. Additionally, inspired by random walk on networks, we propose a novel two-hop random walk strategy to detect fine-size functional modules based on interaction patterns. To overcome the degree heterogeneity problem, we propose an algorithm to identify functional modules with the topological structure that is well separated from the rest of the network as well as densely connected. In order to minimize the impact of the existence of noisy interactions in biological networks, we propose methods to detect conserved functional modules for multiple biological networks by integrating the topological and orthology information across different biological networks. For every algorithm we developed, we compare each of them with the state-of-the-art algorithms on several biological networks. The comparison results on the known gold standard biological function annotations show that our methods can enhance the accuracy of predicting protein complexes and protein functions
    corecore