14,118 research outputs found

    Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems

    Get PDF
    To comprehend the hierarchical organization of large integrated systems, we introduce the hierarchical map equation, which reveals multilevel structures in networks. In this information-theoretic approach, we exploit the duality between compression and pattern detection; by compressing a description of a random walker as a proxy for real flow on a network, we find regularities in the network that induce this system-wide flow. Finding the shortest multilevel description of the random walker therefore gives us the best hierarchical clustering of the network, the optimal number of levels and modular partition at each level, with respect to the dynamics on the network. With a novel search algorithm, we extract and illustrate the rich multilevel organization of several large social and biological networks. For example, from the global air traffic network we uncover countries and continents, and from the pattern of scientific communication we reveal more than 100 scientific fields organized in four major disciplines: life sciences, physical sciences, ecology and earth sciences, and social sciences. In general, we find shallow hierarchical structures in globally interconnected systems, such as neural networks, and rich multilevel organizations in systems with highly separated regions, such as road networks.Comment: 11 pages, 5 figures. For associated code, see http://www.tp.umu.se/~rosvall/code.htm

    Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems

    Full text link
    Unveiling the community structure of networks is a powerful methodology to comprehend interconnected systems across the social and natural sciences. To identify different types of functional modules in interaction data aggregated in a single network layer, researchers have developed many powerful methods. For example, flow-based methods have proven useful for identifying modular dynamics in weighted and directed networks that capture constraints on flow in the systems they represent. However, many networked systems consist of agents or components that exhibit multiple layers of interactions. Inevitably, representing this intricate network of networks as a single aggregated network leads to information loss and may obscure the actual organization. Here we propose a method based on compression of network flows that can identify modular flows in non-aggregated multilayer networks. Our numerical experiments on synthetic networks show that the method can accurately identify modules that cannot be identified in aggregated networks or by analyzing the layers separately. We capitalize on our findings and reveal the community structure of two multilayer collaboration networks: scientists affiliated to the Pierre Auger Observatory and scientists publishing works on networks on the arXiv. Compared to conventional aggregated methods, the multilayer method reveals smaller modules with more overlap that better capture the actual organization

    Model selection and hypothesis testing for large-scale network models with overlapping groups

    Full text link
    The effort to understand network systems in increasing detail has resulted in a diversity of methods designed to extract their large-scale structure from data. Unfortunately, many of these methods yield diverging descriptions of the same network, making both the comparison and understanding of their results a difficult challenge. A possible solution to this outstanding issue is to shift the focus away from ad hoc methods and move towards more principled approaches based on statistical inference of generative models. As a result, we face instead the more well-defined task of selecting between competing generative processes, which can be done under a unified probabilistic framework. Here, we consider the comparison between a variety of generative models including features such as degree correction, where nodes with arbitrary degrees can belong to the same group, and community overlap, where nodes are allowed to belong to more than one group. Because such model variants possess an increasing number of parameters, they become prone to overfitting. In this work, we present a method of model selection based on the minimum description length criterion and posterior odds ratios that is capable of fully accounting for the increased degrees of freedom of the larger models, and selects the best one according to the statistical evidence available in the data. In applying this method to many empirical unweighted networks from different fields, we observe that community overlap is very often not supported by statistical evidence and is selected as a better model only for a minority of them. On the other hand, we find that degree correction tends to be almost universally favored by the available data, implying that intrinsic node proprieties (as opposed to group properties) are often an essential ingredient of network formation.Comment: 20 pages,7 figures, 1 tabl

    Detecting Cohesive and 2-mode Communities in Directed and Undirected Networks

    Full text link
    Networks are a general language for representing relational information among objects. An effective way to model, reason about, and summarize networks, is to discover sets of nodes with common connectivity patterns. Such sets are commonly referred to as network communities. Research on network community detection has predominantly focused on identifying communities of densely connected nodes in undirected networks. In this paper we develop a novel overlapping community detection method that scales to networks of millions of nodes and edges and advances research along two dimensions: the connectivity structure of communities, and the use of edge directedness for community detection. First, we extend traditional definitions of network communities by building on the observation that nodes can be densely interlinked in two different ways: In cohesive communities nodes link to each other, while in 2-mode communities nodes link in a bipartite fashion, where links predominate between the two partitions rather than inside them. Our method successfully detects both 2-mode as well as cohesive communities, that may also overlap or be hierarchically nested. Second, while most existing community detection methods treat directed edges as though they were undirected, our method accounts for edge directions and is able to identify novel and meaningful community structures in both directed and undirected networks, using data from social, biological, and ecological domains.Comment: Published in the proceedings of WSDM '1
    • …
    corecore