14 research outputs found
A framework for the construction of generative models for mesoscale structure in multilayer networks
Multilayer networks allow one to represent diverse and coupled connectivity patterns—such as time-dependence, multiple subsystems, or both—that arise in many applications and which are difficult or awkward to incorporate into standard network representations. In the study of multilayer networks, it is important to investigate mesoscale (i.e., intermediate-scale) structures, such as dense sets of nodes known as communities, to discover network features that are not apparent at the microscale or the macroscale. The ill-defined nature of mesoscale structure and its ubiquity in empirical networks make it crucial to develop generative models that can produce the features that one encounters in empirical networks. Key purposes of such models include generating synthetic networks with empirical properties of interest, benchmarking mesoscale-detection methods and algorithms, and inferring structure in empirical multilayer networks. In this paper, we introduce a framework for the construction of generative models for mesoscale structures in multilayer networks. Our framework provides a standardized set of generative models, together with an associated set of principles from which they are derived, for studies of mesoscale structures in multilayer networks. It unifies and generalizes many existing models for mesoscale structures in fully ordered (e.g., temporal) and unordered (e.g., multiplex) multilayer networks. One can also use it to construct generative models for mesoscale structures in partially ordered multilayer networks (e.g., networks that are both temporal and multiplex). Our framework has the ability to produce many features of empirical multilayer networks, and it explicitly incorporates a user-specified dependency structure between layers. We discuss the parameters and properties of our framework, and we illustrate examples of its use with benchmark models for community-detection methods and algorithms in multilayer networks
Local2Global : a distributed approach for scaling representation learning on graphs
We propose a decentralised “local2global” approach to graph representation learning, that one can a-priori use to scale any embedding technique. Our local2global approach proceeds by first dividing the input graph into overlapping subgraphs (or “patches”) and training local representations for each patch independently. In a second step, we combine the local representations into a globally consistent representation by estimating the set of rigid motions that best align the local representations using information from the patch overlaps, via group synchronization. A key distinguishing feature of local2global relative to existing work is that patches are trained independently without the need for the often costly parameter synchronization during distributed training. This allows local2global to scale to large-scale industrial applications, where the input graph may not even fit into memory and may be stored in a distributed manner. We apply local2global on data sets of different sizes and show that our approach achieves a good trade-off between scale and accuracy on edge reconstruction and semi-supervised classification. We also consider the downstream task of anomaly detection and show how one can use local2global to highlight anomalies in cybersecurity networks
Think locally, act locally: Detection of small, medium-sized, and large communities in large networks
It is common in the study of networks to investigate intermediate-sized (or “meso-scale”) features to try to gain an understanding of network structure and function. For example, numerous algorithms have been developed to try to identify “communities,” which are typically construed as sets of nodes with denser connections internally than with the remainder of a network. In this paper, we adopt a complementary perspective that “communities” are associated with bottlenecks of locally-biased dynamical processes that begin at seed sets of nodes, and we employ several different community-identification procedures (using diffusion-based and geodesic-based dynamics) to investigate community quality as a function of community size. Using several empirical and synthetic networks, we identify several distinct scenarios for “size-resolved community structure” that can arise in real (and realistic) networks: (i) the best small groups of nodes can be better than the best large groups (for a given formulation of the idea of a good community); (ii) the best small groups can have a quality that is comparable to the best medium-sized and large groups; and (iii) the best small groups of nodes can be worse than the best large groups. As we discuss in detail, which of these three cases holds for a given network can make an enormous difference when investigating and making claims about network community structure, and it is important to take this into account to obtain reliable downstream conclusions. Depending on which scenario holds, one may or may not be able to successfully identify “good” communities in a given network (and good communities might not even exist for a given community quality measure), the manner in which different small communities fit together to form meso-scale network structures can be very different, and processes such as viral propagation and information diffusion can exhibit very different dynamics. In addition, our results suggest that, for many large realistic networks, the output of locally-biased methods that focus on communities that are centered around a given seed node might have better conceptual grounding and greater practical utility than the output of global community-detection methods. They also illustrate subtler structural properties that are important to consider in the development of better benchmark networks to test methods for community detection
Networks, communities, and consumer behaviour
Networks are an abstract representation of connections (the "edges") between entities (the "nodes"). One can represent many different types of data in this way, including many social, biological, technological and physical systems. Examples we discuss in this thesis include networks of friendship ties between individuals on Facebook, coauthorship networks between scientists, and similarities in voting patterns between members of the US Congress. Analysing intermediate-sized (or "meso-scale") features often reveals insights about a network's structure and function. A particular type of meso-scale feature are "communities", where one typically thinks of a community as a set of nodes that is particularly "well-connected" internally but has "few" connections to other nodes in a network. A complementary interpretation of a community is as a set of nodes that "trap" a diffusion-like dynamical process for a "long" time. Based on this dynamical interpretation, we investigate "size-resolved community structure" in networks by identifying bottlenecks of locally-biased dynamical processes that start at seed sets of nodes. By sampling many different local communities for different seeds and different strengths of the locality bias of the dynamical process, we obtain a picture of the way communities at different size scales compare in a network. This "size-resolved community structure" provides a signature of community structure in a network and its qualitative features are related to the way local communities combine to form the larger scale structure of a network. For many data sets, ordinary networks are not sufficient to represent the detailed connectivity patterns. For example, connections often evolve over time and one may have different types of connections between the same entities. Multilayer networks provide a framework to represent these different types of situations. The perspective of communities as bottlenecks to dynamical processes extends in a natural way to multilayer networks and we use it to illustrate that two types of random walk on a multilayer network that have been used as the basis for identifying communities in a multilayer network correspond to very different notions of what it means for a set of nodes to be a good multilayer community. This exemplifies the need for multilayer benchmark networks with known community structure to compare the ability of different methods to identify intuitive community structure. We propose a method for generating benchmark networks with general multilayer structure and use it as the basis for a preliminary comparison of different multilayer community detection methods. Finally, we use multilayer community detection to analyse survey data about people's perception of their hair. One key advantage of this type of data compared to most traditional network data sets is that we have a large number of potential explanatory variables that we can use to interpret the results of identifying communities which allows us to identify some potentially interesting hypothesis.</p
Networks, communities, and consumer behaviour
Networks are an abstract representation of connections (the "edges") between entities (the "nodes"). One can represent many different types of data in this way, including many social, biological, technological and physical systems. Examples we discuss in this thesis include networks of friendship ties between individuals on Facebook, coauthorship networks between scientists, and similarities in voting patterns between members of the US Congress. Analysing intermediate-sized (or "meso-scale") features often reveals insights about a network's structure and function. A particular type of meso-scale feature are "communities", where one typically thinks of a community as a set of nodes that is particularly "well-connected" internally but has "few" connections to other nodes in a network. A complementary interpretation of a community is as a set of nodes that "trap" a diffusion-like dynamical process for a "long" time. Based on this dynamical interpretation, we investigate "size-resolved community structure" in networks by identifying bottlenecks of locally-biased dynamical processes that start at seed sets of nodes. By sampling many different local communities for different seeds and different strengths of the locality bias of the dynamical process, we obtain a picture of the way communities at different size scales compare in a network. This "size-resolved community structure" provides a signature of community structure in a network and its qualitative features are related to the way local communities combine to form the larger scale structure of a network. For many data sets, ordinary networks are not sufficient to represent the detailed connectivity patterns. For example, connections often evolve over time and one may have different types of connections between the same entities. Multilayer networks provide a framework to represent these different types of situations. The perspective of communities as bottlenecks to dynamical processes extends in a natural way to multilayer networks and we use it to illustrate that two types of random walk on a multilayer network that have been used as the basis for identifying communities in a multilayer network correspond to very different notions of what it means for a set of nodes to be a good multilayer community. This exemplifies the need for multilayer benchmark networks with known community structure to compare the ability of different methods to identify intuitive community structure. We propose a method for generating benchmark networks with general multilayer structure and use it as the basis for a preliminary comparison of different multilayer community detection methods. Finally, we use multilayer community detection to analyse survey data about people's perception of their hair. One key advantage of this type of data compared to most traditional network data sets is that we have a large number of potential explanatory variables that we can use to interpret the results of identifying communities which allows us to identify some potentially interesting hypothesis.</p
Data for numerical examples in "Generative Benchmark Models for Mesoscale Structure in Multilayer Networks"
This package includes the raw data used to generate the plots in Figures 8-10 of "Generative benchmark models for mesoscale structure in multilayer networks", by M. Bazzi, L. G. S. Jeub, A. Arenas, S. D. Howison, and M. A. Porter, arXiv:1608.06196. It was created using MATLAB and all variables are stored in .mat format. Please see the "Readme.txt" file for additional information on the data set
Multiresolution Consensus Clustering in Networks
Networks often exhibit structure at disparate scales. We propose a method for identifying community structure at different scales based on multiresolution modularity and consensus clustering. Our contribution consists of two parts. First, we propose a strategy for sampling the entire range of possible resolutions for the multiresolution modularity quality function. Our approach is directly based on the properties of modularity and, in particular, provides a natural way of avoiding the need to increase the resolution parameter by several orders of magnitude to break a few remaining small communities, necessitating the introduction of ad-hoc limits to the resolution range with standard sampling approaches. Second, we propose a hierarchical consensus clustering procedure, based on a modified modularity, that allows one to construct a hierarchical consensus structure given a set of input partitions. While here we are interested in its application to partitions sampled using multiresolution modularity, this consensus clustering procedure can be applied to the output of any clustering algorithm. As such, we see many potential applications of the individual parts of our multiresolution consensus clustering procedure in addition to using the procedure itself to identify hierarchical structure in networks
Weight thresholding on complex networks
Weight thresholding is a simple technique that aims at reducing the number of edges in weighted networks that are otherwise too dense for the application of standard graph-theoretical methods. We show that the group structure of real weighted networks is very robust under weight thresholding, as it is maintained even when most of the edges are removed. This appears to be related to the correlation between topology and weight that characterizes real networks. On the other hand, the behavior of other properties is generally system dependent