60 research outputs found

    Pruning based Distance Sketches with Provable Guarantees on Random Graphs

    Full text link
    Measuring the distances between vertices on graphs is one of the most fundamental components in network analysis. Since finding shortest paths requires traversing the graph, it is challenging to obtain distance information on large graphs very quickly. In this work, we present a preprocessing algorithm that is able to create landmark based distance sketches efficiently, with strong theoretical guarantees. When evaluated on a diverse set of social and information networks, our algorithm significantly improves over existing approaches by reducing the number of landmarks stored, preprocessing time, or stretch of the estimated distances. On Erd\"{o}s-R\'{e}nyi graphs and random power law graphs with degree distribution exponent 2<β<32 < \beta < 3, our algorithm outputs an exact distance data structure with space between Θ(n5/4)\Theta(n^{5/4}) and Θ(n3/2)\Theta(n^{3/2}) depending on the value of β\beta, where nn is the number of vertices. We complement the algorithm with tight lower bounds for Erdos-Renyi graphs and the case when β\beta is close to two.Comment: Full version for the conference paper to appear in The Web Conference'1

    Least Squares Ranking on Graphs

    Full text link
    Given a set of alternatives to be ranked, and some pairwise comparison data, ranking is a least squares computation on a graph. The vertices are the alternatives, and the edge values comprise the comparison data. The basic idea is very simple and old: come up with values on vertices such that their differences match the given edge data. Since an exact match will usually be impossible, one settles for matching in a least squares sense. This formulation was first described by Leake in 1976 for rankingfootball teams and appears as an example in Professor Gilbert Strang's classic linear algebra textbook. If one is willing to look into the residual a little further, then the problem really comes alive, as shown effectively by the remarkable recent paper of Jiang et al. With or without this twist, the humble least squares problem on graphs has far-reaching connections with many current areas ofresearch. These connections are to theoretical computer science (spectral graph theory, and multilevel methods for graph Laplacian systems); numerical analysis (algebraic multigrid, and finite element exterior calculus); other mathematics (Hodge decomposition, and random clique complexes); and applications (arbitrage, and ranking of sports teams). Not all of these connections are explored in this paper, but many are. The underlying ideas are easy to explain, requiring only the four fundamental subspaces from elementary linear algebra. One of our aims is to explain these basic ideas and connections, to get researchers in many fields interested in this topic. Another aim is to use our numerical experiments for guidance on selecting methods and exposing the need for further development.Comment: Added missing references, comparison of linear solvers overhauled, conclusion section added, some new figures adde

    Privacy and Anonymization of Neighborhoods in Multiplex Networks

    Get PDF
    Since the beginning of the digital age, the amount of available data on human behaviour has dramatically increased, along with the risk for the privacy of the represented subjects. Since the analysis of those data can bring advances to science, it is important to share them while preserving the subjects' anonymity. A significant portion of the available information can be modelled as networks, introducing an additional privacy risk related to the structure of the data themselves. For instance, in a social network, people can be uniquely identifiable because of the structure of their neighborhood, formed by the amount of their friends and the connections between them. The neighborhood's structure is the target of an identity disclosure attack on released social network data, called neighborhood attack. To mitigate this threat, algorithms to anonymize networks have been proposed. However, this problem has not been deeply studied on multiplex networks, which combine different social network data into a single representation. The multiplex network representation makes the neighborhood attack setting more complicated, and adds information that an attacker can use to re-identify subjects. This thesis aims to understand how multiplex networks behave in terms of anonymization difficulty and neighborhood attack. We present two definitions of multiplex neighborhoods, and discuss how the fraction of nodes with unique neighborhoods can be affected. Through analysis of network models, we study the variation of the uniqueness of neighborhoods in networks with different structure and characteristics. We show that the uniqueness of neighborhoods has a linear trend depending on the network size and average degree. If the network has a more random structure, the uniqueness decreases significantly when the network size increases. On the other hand, if the local structure is more pronounced, the uniqueness is not strongly influenced by the number of nodes. We also conduct a motif analysis to study the recurring patterns that can make social networks' neighborhoods less unique. Lastly, we propose an algorithm to anonymize a pair of multiplex neighborhoods. This algorithm is the core building block that can be used in a method to prevent neighborhood attacks on multiplex networks

    Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks

    Full text link
    In this paper, we study the OOD generalization of neural algorithmic reasoning tasks, where the goal is to learn an algorithm (e.g., sorting, breadth-first search, and depth-first search) from input-output pairs using deep neural networks. First, we argue that OOD generalization in this setting is significantly different than common OOD settings. For example, some phenomena in OOD generalization of image classifications such as \emph{accuracy on the line} are not observed here, and techniques such as data augmentation methods do not help as assumptions underlying many augmentation techniques are often violated. Second, we analyze the main challenges (e.g., input distribution shift, non-representative data generation, and uninformative validation metrics) of the current leading benchmark, i.e., CLRS \citep{deepmind2021clrs}, which contains 30 algorithmic reasoning tasks. We propose several solutions, including a simple-yet-effective fix to the input distribution shift and improved data generation. Finally, we propose an attention-based 2WL-graph neural network (GNN) processor which complements message-passing GNNs so their combination outperforms the state-of-the-art model by a 3% margin averaged over all algorithms. Our code is available at: \url{https://github.com/smahdavi4/clrs}

    Neural function approximation on graphs: shape modelling, graph discrimination & compression

    Get PDF
    Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces

    A generative model for latent position graphs

    Get PDF
    Recently, there has been an explosion of research into machine learning methods applied to graph data. Most work is focused on performing either node classification or graph classification; however, there is much to be gained by learning instead a generative model for the underlying random graph distribution. We present a novel neural network-based approach to learning generative models for random graphs. The features used for training are graphlets, subgraph counts of small order, and the loss function is based on a moment estimator for these features. Random graphs are realized by feeding random noise into the network and applying a kernel to the output; in this way, our model is a generalization of the ubiquitous Random Dot Product Graph. Networks produced this way are demonstrated to be able to imitate data from chemistry, medicine, and social networks. The created graphs are similar enough to the target data to be able to fool discriminator neural networks otherwise capable of separating classes of random graphs. This method is inexpensive, accurate, and is readily applied to data-poor problems

    Private Graph Data Release: A Survey

    Full text link
    The application of graph analytics to various domains have yielded tremendous societal and economical benefits in recent years. However, the increasingly widespread adoption of graph analytics comes with a commensurate increase in the need to protect private information in graph databases, especially in light of the many privacy breaches in real-world graph data that was supposed to preserve sensitive information. This paper provides a comprehensive survey of private graph data release algorithms that seek to achieve the fine balance between privacy and utility, with a specific focus on provably private mechanisms. Many of these mechanisms fall under natural extensions of the Differential Privacy framework to graph data, but we also investigate more general privacy formulations like Pufferfish Privacy that can deal with the limitations of Differential Privacy. A wide-ranging survey of the applications of private graph data release mechanisms to social networks, finance, supply chain, health and energy is also provided. This survey paper and the taxonomy it provides should benefit practitioners and researchers alike in the increasingly important area of private graph data release and analysis
    • …
    corecore