94 research outputs found

    Unsupervised Structural Embedding Methods for Efficient Collective Network Mining

    Full text link
    How can we align accounts of the same user across social networks? Can we identify the professional role of an email user from their patterns of communication? Can we predict the medical effects of chemical compounds from their atomic network structure? Many problems in graph data mining, including all of the above, are defined on multiple networks. The central element to all of these problems is cross-network comparison, whether at the level of individual nodes or entities in the network or at the level of entire networks themselves. To perform this comparison meaningfully, we must describe the entities in each network expressively in terms of patterns that generalize across the networks. Moreover, because the networks in question are often very large, our techniques must be computationally efficient. In this thesis, we propose scalable unsupervised methods that embed nodes in vector space by mapping nodes with similar structural roles in their respective networks, even if they come from different networks, to similar parts of the embedding space. We perform network alignment by matching nodes across two or more networks based on the similarity of their embeddings, and refine this process by reinforcing the consistency of each node’s alignment with those of its neighbors. By characterizing the distribution of node embeddings in a graph, we develop graph-level feature vectors that are highly effective for graph classification. With principled sparsification and randomized approximation techniques, we make all our methods computationally efficient and able to scale to graphs with millions of nodes or edges. We demonstrate the effectiveness of structural node embeddings on industry-scale applications, and propose an extensive set of embedding evaluation techniques that lay the groundwork for further methodological development and application.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162895/1/mheimann_1.pd

    REGAL: Representation Learning-based Graph Alignment

    Full text link
    Problems involving multiple networks are prevalent in many scientific and other domains. In particular, network alignment, or the task of identifying corresponding nodes in different networks, has applications across the social and natural sciences. Motivated by recent advancements in node representation learning for single-graph tasks, we propose REGAL (REpresentation learning-based Graph ALignment), a framework that leverages the power of automatically-learned node representations to match nodes across different graphs. Within REGAL we devise xNetMF, an elegant and principled node embedding formulation that uniquely generalizes to multi-network problems. Our results demonstrate the utility and promise of unsupervised representation learning-based network alignment in terms of both speed and accuracy. REGAL runs up to 30x faster in the representation learning stage than comparable methods, outperforms existing network alignment methods by 20 to 30% accuracy on average, and scales to networks with millions of nodes each.Comment: In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM), 201

    City improvement districts in South Africa: An exploratory overview

    Get PDF
    This article discusses the City Improvement District as a recent phenomenon in urban South Africa. The reason for doing so stems from a lack of basic information on the concept in the local literature. The findings, based on a study conducted in the second half of 2004 and including all the Improvement Districts in operation in the country at that time, are presented in three key areas of the concept – land use profiles, financial aspects and services rendered. This is done within the context of the international situation with regards to the concept, and compared with an international study conducted in 2003, as basis

    Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks

    Full text link
    Safe deployment of graph neural networks (GNNs) under distribution shift requires models to provide accurate confidence indicators (CI). However, while it is well-known in computer vision that CI quality diminishes under distribution shift, this behavior remains understudied for GNNs. Hence, we begin with a case study on CI calibration under controlled structural and feature distribution shifts and demonstrate that increased expressivity or model size do not always lead to improved CI performance. Consequently, we instead advocate for the use of epistemic uncertainty quantification (UQ) methods to modulate CIs. To this end, we propose G-Δ\DeltaUQ, a new single model UQ method that extends the recently proposed stochastic centering framework to support structured data and partial stochasticity. Evaluated across covariate, concept, and graph size shifts, G-Δ\DeltaUQ not only outperforms several popular UQ methods in obtaining calibrated CIs, but also outperforms alternatives when CIs are used for generalization gap prediction or OOD detection. Overall, our work not only introduces a new, flexible GNN UQ method, but also provides novel insights into GNN CIs on safety-critical tasks.Comment: 22 pages, 11 figure

    Analyzing Data-Centric Properties for Graph Contrastive Learning

    Full text link
    Recent analyses of self-supervised learning (SSL) find the following data-centric properties to be critical for learning good representations: invariance to task-irrelevant semantics, separability of classes in some latent space, and recoverability of labels from augmented samples. However, given their discrete, non-Euclidean nature, graph datasets and graph SSL methods are unlikely to satisfy these properties. This raises the question: how do graph SSL methods, such as contrastive learning (CL), work well? To systematically probe this question, we perform a generalization analysis for CL when using generic graph augmentations (GGAs), with a focus on data-centric properties. Our analysis yields formal insights into the limitations of GGAs and the necessity of task-relevant augmentations. As we empirically show, GGAs do not induce task-relevant invariances on common benchmark datasets, leading to only marginal gains over naive, untrained baselines. Our theory motivates a synthetic data generation process that enables control over task-relevant information and boasts pre-defined optimal augmentations. This flexible benchmark helps us identify yet unrecognized limitations in advanced augmentation techniques (e.g., automated methods). Overall, our work rigorously contextualizes, both empirically and theoretically, the effects of data-centric properties on augmentation strategies and learning paradigms for graph SSL.Comment: Accepted to NeurIPS 202

    On Performance Discrepancies Across Local Homophily Levels in Graph Neural Networks

    Full text link
    Research on GNNs has highlighted a relationship between high homophily (i.e., the tendency for nodes of a similar class to connect) and strong predictive performance in node classification. However, recent research has found the relationship to be more nuanced, demonstrating that even simple GNNs can learn in certain heterophilous settings. To bridge the gap between these findings, we revisit the assumptions made in previous works and identify that datasets are often treated as having a constant homophily level across nodes. To align closer to real-world datasets, we theoretically and empirically study the performance of GNNs when the local homophily level of a node deviates at test-time from the global homophily level of its graph. To aid our theoretical analysis, we introduce a new parameter to the preferential attachment model commonly used in homophily analysis to enable the control of local homophily levels in generated graphs, enabling a systematic empirical study on how local homophily can impact performance. We additionally perform a granular analysis on a number of real-world datasets with varying global homophily levels. Across our theoretical and empirical results, we find that (a)~ GNNs can fail to generalize to test nodes that deviate from the global homophily of a graph, (b)~ high local homophily does not necessarily confer high performance for a node, and (c)~ GNN models designed to handle heterophily are able to perform better across varying heterophily ranges irrespective of the dataset's global homophily. These findings point towards a GNN's over-reliance on the global homophily used for training and motivates the need to design GNNs that can better generalize across large local homophily ranges

    G-CREWE: Graph CompREssion With Embedding for Network Alignment

    Full text link
    Network alignment is useful for multiple applications that require increasingly large graphs to be processed. Existing research approaches this as an optimization problem or computes the similarity based on node representations. However, the process of aligning every pair of nodes between relatively large networks is time-consuming and resource-intensive. In this paper, we propose a framework, called G-CREWE (Graph CompREssion With Embedding) to solve the network alignment problem. G-CREWE uses node embeddings to align the networks on two levels of resolution, a fine resolution given by the original network and a coarse resolution given by a compressed version, to achieve an efficient and effective network alignment. The framework first extracts node features and learns the node embedding via a Graph Convolutional Network (GCN). Then, node embedding helps to guide the process of graph compression and finally improve the alignment performance. As part of G-CREWE, we also propose a new compression mechanism called MERGE (Minimum dEgRee neiGhbors comprEssion) to reduce the size of the input networks while preserving the consistency in their topological structure. Experiments on all real networks show that our method is more than twice as fast as the most competitive existing methods while maintaining high accuracy.Comment: 10 pages, accepted at the 29th ACM International Conference onInformation and Knowledge Management (CIKM 20

    Observation and integrated Earth-system science: a roadmap for 2016–2025

    Get PDF
    This report is the response to a request by the Committee on Space Research of the International Council for Science to prepare a roadmap on observation and integrated Earth-system science for the coming ten years. Its focus is on the combined use of observations and modelling to address the functioning, predictability and projected evolution of interacting components of the Earth system on timescales out to a century or so. It discusses how observations support integrated Earth-system science and its applications, and identifies planned enhancements to the contributing observing systems and other requirements for observations and their processing. All types of observation are considered, but emphasis is placed on those made from space. The origins and development of the integrated view of the Earth system are outlined, noting the interactions between the main components that lead to requirements for integrated science and modelling, and for the observations that guide and support them. What constitutes an Earth-system model is discussed. Summaries are given of key cycles within the Earth system. The nature of Earth observation and the arrangements for international coordination essential for effective operation of global observing systems are introduced. Instances are given of present types of observation, what is already on the roadmap for 2016–2025 and some of the issues to be faced. Observations that are organised on a systematic basis and observations that are made for process understanding and model development, or other research or demonstration purposes, are covered. Specific accounts are given for many of the variables of the Earth system. The current status and prospects for Earth-system modelling are summarized. The evolution towards applying Earth-system models for environmental monitoring and prediction as well as for climate simulation and projection is outlined. General aspects of the improvement of models, whether through refining the representations of processes that are already incorporated or through adding new processes or components, are discussed. Some important elements of Earth-system models are considered more fully. Data assimilation is discussed not only because it uses observations and models to generate datasets for monitoring the Earth system and for initiating and evaluating predictions, in particular through reanalysis, but also because of the feedback it provides on the quality of both the observations and the models employed. Inverse methods for surface-flux or model-parameter estimation are also covered. Reviews are given of the way observations and the processed datasets based on them are used for evaluating models, and of the combined use of observations and models for monitoring and interpreting the behaviour of the Earth system and for predicting and projecting its future. A set of concluding discussions covers general developmental needs, requirements for continuity of space-based observing systems, further long-term requirements for observations and other data, technological advances and data challenges, and the importance of enhanced international co-operation
    corecore