60 research outputs found
Pruning based Distance Sketches with Provable Guarantees on Random Graphs
Measuring the distances between vertices on graphs is one of the most
fundamental components in network analysis. Since finding shortest paths
requires traversing the graph, it is challenging to obtain distance information
on large graphs very quickly. In this work, we present a preprocessing
algorithm that is able to create landmark based distance sketches efficiently,
with strong theoretical guarantees. When evaluated on a diverse set of social
and information networks, our algorithm significantly improves over existing
approaches by reducing the number of landmarks stored, preprocessing time, or
stretch of the estimated distances.
On Erd\"{o}s-R\'{e}nyi graphs and random power law graphs with degree
distribution exponent , our algorithm outputs an exact distance
data structure with space between and
depending on the value of , where is the number of vertices. We
complement the algorithm with tight lower bounds for Erdos-Renyi graphs and the
case when is close to two.Comment: Full version for the conference paper to appear in The Web
Conference'1
Least Squares Ranking on Graphs
Given a set of alternatives to be ranked, and some pairwise comparison data,
ranking is a least squares computation on a graph. The vertices are the
alternatives, and the edge values comprise the comparison data. The basic idea
is very simple and old: come up with values on vertices such that their
differences match the given edge data. Since an exact match will usually be
impossible, one settles for matching in a least squares sense. This formulation
was first described by Leake in 1976 for rankingfootball teams and appears as
an example in Professor Gilbert Strang's classic linear algebra textbook. If
one is willing to look into the residual a little further, then the problem
really comes alive, as shown effectively by the remarkable recent paper of
Jiang et al. With or without this twist, the humble least squares problem on
graphs has far-reaching connections with many current areas ofresearch. These
connections are to theoretical computer science (spectral graph theory, and
multilevel methods for graph Laplacian systems); numerical analysis (algebraic
multigrid, and finite element exterior calculus); other mathematics (Hodge
decomposition, and random clique complexes); and applications (arbitrage, and
ranking of sports teams). Not all of these connections are explored in this
paper, but many are. The underlying ideas are easy to explain, requiring only
the four fundamental subspaces from elementary linear algebra. One of our aims
is to explain these basic ideas and connections, to get researchers in many
fields interested in this topic. Another aim is to use our numerical
experiments for guidance on selecting methods and exposing the need for further
development.Comment: Added missing references, comparison of linear solvers overhauled,
conclusion section added, some new figures adde
Privacy and Anonymization of Neighborhoods in Multiplex Networks
Since the beginning of the digital age, the amount of available data on human behaviour has dramatically increased, along with the risk for the privacy of the represented subjects. Since the analysis of those data can bring advances to science, it is important to share them while preserving the subjects' anonymity. A significant portion of the available information can be modelled as networks, introducing an additional privacy risk related to the structure of the data themselves. For instance, in a social network, people can be uniquely identifiable because of the structure of their neighborhood, formed by the amount of their friends and the connections between them. The neighborhood's structure is the target of an identity disclosure attack on released social network data, called neighborhood attack. To mitigate this threat, algorithms to anonymize networks have been proposed. However, this problem has not been deeply studied on multiplex networks, which combine different social network data into a single representation. The multiplex network representation makes the neighborhood attack setting more complicated, and adds information that an attacker can use to re-identify subjects.
This thesis aims to understand how multiplex networks behave in terms of anonymization difficulty and neighborhood attack. We present two definitions of multiplex neighborhoods, and discuss how the fraction of nodes with unique neighborhoods can be affected.
Through analysis of network models, we study the variation of the uniqueness of neighborhoods in networks with different structure and characteristics. We show that the uniqueness of neighborhoods has a linear trend depending on the network size and average degree. If the network has a more random structure, the uniqueness decreases significantly when the network size increases. On the other hand, if the local structure is more pronounced, the uniqueness is not strongly influenced by the number of nodes. We also conduct a motif analysis to study the recurring patterns that can make social networks' neighborhoods less unique.
Lastly, we propose an algorithm to anonymize a pair of multiplex neighborhoods. This algorithm is the core building block that can be used in a method to prevent neighborhood attacks on multiplex networks
Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks
In this paper, we study the OOD generalization of neural algorithmic
reasoning tasks, where the goal is to learn an algorithm (e.g., sorting,
breadth-first search, and depth-first search) from input-output pairs using
deep neural networks. First, we argue that OOD generalization in this setting
is significantly different than common OOD settings. For example, some
phenomena in OOD generalization of image classifications such as \emph{accuracy
on the line} are not observed here, and techniques such as data augmentation
methods do not help as assumptions underlying many augmentation techniques are
often violated. Second, we analyze the main challenges (e.g., input
distribution shift, non-representative data generation, and uninformative
validation metrics) of the current leading benchmark, i.e., CLRS
\citep{deepmind2021clrs}, which contains 30 algorithmic reasoning tasks. We
propose several solutions, including a simple-yet-effective fix to the input
distribution shift and improved data generation. Finally, we propose an
attention-based 2WL-graph neural network (GNN) processor which complements
message-passing GNNs so their combination outperforms the state-of-the-art
model by a 3% margin averaged over all algorithms. Our code is available at:
\url{https://github.com/smahdavi4/clrs}
Neural function approximation on graphs: shape modelling, graph discrimination & compression
Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces
Recommended from our members
Scalable Community Detection in Massive Networks using Aggregated Relational Data
The analysis of networks is used in many fields of study including statistics, social science, computer sciences, physics, and biology. The interest in networks is diverse as it usually depends on the field of study. For instance, social scientists are interested in interpreting how edges arise, while biologists seek to understand underlying biological processes. Among the problems being explored in network analysis, community detection stands out as being one of the most important. Community detection seeks to find groups of nodes with a large concentration of links within but few between. Inferring groups are important in many applications as they are used for further downstream analysis. For example, identifying clusters of consumers with similar purchasing behavior in a customer and product network can be used to create better recommendation systems. Finding a node with a high concentration of its edges to other nodes in the community may give insight into how the community formed.
Many statistical models for networks implicitly define the notion of a community. Statistical inference aims to fit a model that posits how vertices are connected to each other. One of the most common models for community detection is the stochastic block model (SBM) [Holland et al., 1983]. Although simple, it is a highly expressive family of random graphs. However, it does have its drawbacks. First, it does not capture the degree distribution of real-world networks. Second, it allows nodes to only belong to one community. In many applications, it is useful to consider overlapping communities. The Mixed Membership Stochastic Blockmodel (MMSB) is a Bayesian extension of the SBM that allows nodes to belong to multiple communities.
Fitting large Bayesian network models quickly become computationally infeasible when the number of nodes grows into the hundred of thousands and millions. In particular, the number of parameters in the MMSB grows as the number of nodes squared. This thesis introduces an efficient method for fitting a Bayesian model to massive networks through use of aggregated relational data. Our inference method converges faster than existing methods by leveraging nodal information that often accompany real world networks. Conditioning on this extra information leads to a model that admits a parallel variational inference algorithm. We apply our method to a citation network with over three million nodes and 25 million edges. Our method converges faster than existing posterior inference algorithms for the MMSB and recovers parameters better on simulated networks generated according to the MMSB
A generative model for latent position graphs
Recently, there has been an explosion of research into machine learning methods applied to graph data. Most work is focused on performing either node classification or graph classification; however, there is much to be gained by learning instead a generative model for the underlying random graph distribution. We present a novel neural network-based approach to learning generative models for random graphs. The features used for training are graphlets, subgraph counts of small order, and the loss function is based on a moment estimator for these features. Random graphs are realized by feeding random noise into the network and applying a kernel to the output; in this way, our model is a generalization of the ubiquitous Random Dot Product Graph. Networks produced this way are demonstrated to be able to imitate data from chemistry, medicine, and social networks. The created graphs are similar enough to the target data to be able to fool discriminator neural networks otherwise capable of separating classes of random graphs. This method is inexpensive, accurate, and is readily applied to data-poor problems
Private Graph Data Release: A Survey
The application of graph analytics to various domains have yielded tremendous
societal and economical benefits in recent years. However, the increasingly
widespread adoption of graph analytics comes with a commensurate increase in
the need to protect private information in graph databases, especially in light
of the many privacy breaches in real-world graph data that was supposed to
preserve sensitive information. This paper provides a comprehensive survey of
private graph data release algorithms that seek to achieve the fine balance
between privacy and utility, with a specific focus on provably private
mechanisms. Many of these mechanisms fall under natural extensions of the
Differential Privacy framework to graph data, but we also investigate more
general privacy formulations like Pufferfish Privacy that can deal with the
limitations of Differential Privacy. A wide-ranging survey of the applications
of private graph data release mechanisms to social networks, finance, supply
chain, health and energy is also provided. This survey paper and the taxonomy
it provides should benefit practitioners and researchers alike in the
increasingly important area of private graph data release and analysis
- …