22 research outputs found

    Significance-based community detection in weighted networks

    Get PDF
    Community detection is the process of grouping strongly connected nodes in a network. Many community detection methods for un-weighted networks have a theoretical basis in a null model. Communities discovered by these methods therefore have interpretations in terms of statistical signficance. In this paper, we introduce a null for weighted networks called the continuous configuration model. We use the model both as a tool for community detection and for simulating weighted networks with null nodes. First, we propose a community extraction algorithm for weighted networks which incorporates iterative hypothesis testing under the null. We prove a central limit theorem for edge-weight sums and asymptotic consistency of the algorithm under a weighted stochastic block model. We then incorporate the algorithm in a community detection method called CCME. To benchmark the method, we provide a simulation framework incorporating the null to plant "background" nodes in weighted networks with communities. We show that the empirical performance of CCME on these simulations is competitive with existing methods, particularly when overlapping communities and background nodes are present. To further validate the method, we present two real-world networks with potential background nodes and analyze them with CCME, yielding results that reveal macro-features of the corresponding systems.Comment: Code and supplemental info available at http://stats.johnpalowitch.com/ccme. V3 changes: based on lengthy referee revision process, new theoretical sections added, + major organizational changes. V2 changes: grant info added, 1 reference added, bibliography section moved to end, condensed bib line spacing, corrected typo

    Testing-Based Community Detection Methods for Complex Networks

    Get PDF
    Community detection is an exploratory method of grouping strongly connected nodes in a network, in most cases using only the network edge structure as a guide. Using discovered communities for downstream analyses can be crucial for real-world decision-making and inference. Recent approaches to community detection include testing-based community extraction, a process in which communities are refined one-by-one via analysis of graph statistics. However, to date, testing-based extraction methods are tied to the configuration model as a null, which applies only to single-layer, binary graphs. In this thesis, testing-based extraction is generalized to arbitrary networks types with a framework called Node-Set Testing (NST). The NST framework defines the broader statistical elements of an approach that uses hypothesis testing to detect communities in complex networks. The NST framework is applied to (i) weighted networks and (ii) bipartite correlation networks, resulting in novel community detection algorithms. In particular, new null models and test statistics are specified to apply iterative hypothesis-testing algorithms on these types of networks. Detailed analyses of the empirical and theoretical properties of the proposed methods are provided. Other chapters in this thesis, while not explicitly involving testing-based algorithms, support the discussion of community detection in heterogeneous networks. One chapter provides a consistency analysis of a significance-based score for community extraction in multilayer networks. In another chapter, preceding the discussion of the NST method for bipartite correlation networks, an application area called eQTL analysis is discussed. In particular, a new model for estimating the effect size and regression correlation of the links in an eQTL network is introduced and studied.Doctor of Philosoph

    Graph Clustering with Graph Neural Networks

    Full text link
    Graph Neural Networks (GNNs) have achieved state-of-the-art results on many graph analysis tasks such as node classification and link prediction. However, important unsupervised problems on graphs, such as graph clustering, have proved more resistant to advances in GNNs. In this paper, we study unsupervised training of GNN pooling in terms of their clustering capabilities. We start by drawing a connection between graph clustering and graph pooling: intuitively, a good graph clustering is what one would expect from a GNN pooling layer. Counterintuitively, we show that this is not true for state-of-the-art pooling methods, such as MinCut pooling. To address these deficiencies, we introduce Deep Modularity Networks (DMoN), an unsupervised pooling method inspired by the modularity measure of clustering quality, and show how it tackles recovery of the challenging clustering structure of real-world graphs. In order to clarify the regimes where existing methods fail, we carefully design a set of experiments on synthetic data which show that DMoN is able to jointly leverage the signal from the graph structure and node attributes. Similarly, on real-world data, we show that DMoN produces high quality clusters which correlate strongly with ground truth labels, achieving state-of-the-art results

    Examining the Effects of Degree Distribution and Homophily in Graph Learning Models

    Full text link
    Despite a surge in interest in GNN development, homogeneity in benchmarking datasets still presents a fundamental issue to GNN research. GraphWorld is a recent solution which uses the Stochastic Block Model (SBM) to generate diverse populations of synthetic graphs for benchmarking any GNN task. Despite its success, the SBM imposed fundamental limitations on the kinds of graph structure GraphWorld could create. In this work we examine how two additional synthetic graph generators can improve GraphWorld's evaluation; LFR, a well-established model in the graph clustering literature and CABAM, a recent adaptation of the Barabasi-Albert model tailored for GNN benchmarking. By integrating these generators, we significantly expand the coverage of graph space within the GraphWorld framework while preserving key graph properties observed in real-world networks. To demonstrate their effectiveness, we generate 300,000 graphs to benchmark 11 GNN models on a node classification task. We find GNN performance variations in response to homophily, degree distribution and feature signal. Based on these findings, we classify models by their sensitivity to the new generators under these properties. Additionally, we release the extensions made to GraphWorld on the GitHub repository, offering further evaluation of GNN performance on new graphs.Comment: Accepted to Workshop on Graph Learning Benchmarks at KDD 202

    Community Extraction in Multilayer Networks with Heterogeneous Community Structure.

    Get PDF
    Multilayer networks are a useful way to capture and model multiple, binary or weighted relationships among a fixed group of objects. While community detection has proven to be a useful exploratory technique for the analysis of single-layer networks, the development of community detection methods for multilayer networks is still in its infancy. We propose and investigate a procedure, called Multilayer Extraction, that identifies densely connected vertex-layer sets in multilayer networks. Multilayer Extraction makes use of a significance based score that quantifies the connectivity of an observed vertex-layer set through comparison with a fixed degree random graph model. Multilayer Extraction directly handles networks with heterogeneous layers where community structure may be different from layer to layer. The procedure can capture overlapping communities, as well as background vertex-layer pairs that do not belong to any community. We establish consistency of the vertex-layer set optimizer of our proposed multilayer score under the multilayer stochastic block model. We investigate the performance of Multilayer Extraction on three applications and a test bed of simulations. Our theoretical and numerical evaluations suggest that Multilayer Extraction is an effective exploratory tool for analyzing complex multilayer networks. Publicly available code is available at https://github.com/jdwilson4/MultilayerExtraction

    Graph Generative Model for Benchmarking Graph Neural Networks

    Full text link
    As the field of Graph Neural Networks (GNN) continues to grow, it experiences a corresponding increase in the need for large, real-world datasets to train and test new GNN models on challenging, realistic problems. Unfortunately, such graph datasets are often generated from online, highly privacy-restricted ecosystems, which makes research and development on these datasets hard, if not impossible. This greatly reduces the amount of benchmark graphs available to researchers, causing the field to rely only on a handful of publicly-available datasets. To address this problem, we introduce a novel graph generative model, Computation Graph Transformer (CGT) that learns and reproduces the distribution of real-world graphs in a privacy-controlled way. More specifically, CGT (1) generates effective benchmark graphs on which GNNs show similar task performance as on the source graphs, (2) scales to process large-scale graphs, (3) incorporates off-the-shelf privacy modules to guarantee end-user privacy of the generated graph. Extensive experiments across a vast body of graph generative models show that only our model can successfully generate privacy-controlled, synthetic substitutes of large-scale real-world graphs that can be effectively used to benchmark GNN models
    corecore