121 research outputs found
Neural Embeddings of Graphs in Hyperbolic Space
Neural embeddings have been used with great success in Natural Language
Processing (NLP). They provide compact representations that encapsulate word
similarity and attain state-of-the-art performance in a range of linguistic
tasks. The success of neural embeddings has prompted significant amounts of
research into applications in domains other than language. One such domain is
graph-structured data, where embeddings of vertices can be learned that
encapsulate vertex similarity and improve performance on tasks including edge
prediction and vertex labelling. For both NLP and graph based tasks, embeddings
have been learned in high-dimensional Euclidean spaces. However, recent work
has shown that the appropriate isometric space for embedding complex networks
is not the flat Euclidean space, but negatively curved, hyperbolic space. We
present a new concept that exploits these recent insights and propose learning
neural embeddings of graphs in hyperbolic space. We provide experimental
evidence that embedding graphs in their natural geometry significantly improves
performance on downstream tasks for several real-world public datasets.Comment: 7 pages, 5 figure
Probabilistic Inference of Twitter Users' Age based on What They Follow
Twitter provides an open and rich source of data for studying human behaviour
at scale and is widely used in social and network sciences. However, a major
criticism of Twitter data is that demographic information is largely absent.
Enhancing Twitter data with user ages would advance our ability to study social
network structures, information flows and the spread of contagions. Approaches
toward age detection of Twitter users typically focus on specific properties of
tweets, e.g., linguistic features, which are language dependent. In this paper,
we devise a language-independent methodology for determining the age of Twitter
users from data that is native to the Twitter ecosystem. The key idea is to use
a Bayesian framework to generalise ground-truth age information from a few
Twitter users to the entire network based on what/whom they follow. Our
approach scales to inferring the age of 700 million Twitter accounts with high
accuracy.Comment: 9 pages, 9 figure
On the Unreasonable Effectiveness of Feature propagation in Learning on Graphs with Missing Node Features
While Graph Neural Networks (GNNs) have recently become the de facto standard
for modeling relational data, they impose a strong assumption on the
availability of the node or edge features of the graph. In many real-world
applications, however, features are only partially available; for example, in
social networks, age and gender are available only for a small subset of users.
We present a general approach for handling missing features in graph machine
learning applications that is based on minimization of the Dirichlet energy and
leads to a diffusion-type differential equation on the graph. The
discretization of this equation produces a simple, fast and scalable algorithm
which we call Feature Propagation. We experimentally show that the proposed
approach outperforms previous methods on seven common node-classification
benchmarks and can withstand surprisingly high rates of missing features: on
average we observe only around 4% relative accuracy drop when 99% of the
features are missing. Moreover, it takes only 10 seconds to run on a graph with
2.5M nodes and 123M edges on a single GPU
Graph Neural Networks for Link Prediction with Subgraph Sketching
Many Graph Neural Networks (GNNs) perform poorly compared to simple
heuristics on Link Prediction (LP) tasks. This is due to limitations in
expressive power such as the inability to count triangles (the backbone of most
LP heuristics) and because they can not distinguish automorphic nodes (those
having identical structural roles). Both expressiveness issues can be
alleviated by learning link (rather than node) representations and
incorporating structural features such as triangle counts. Since explicit link
representations are often prohibitively expensive, recent works resorted to
subgraph-based methods, which have achieved state-of-the-art performance for
LP, but suffer from poor efficiency due to high levels of redundancy between
subgraphs. We analyze the components of subgraph GNN (SGNN) methods for link
prediction. Based on our analysis, we propose a novel full-graph GNN called
ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as
messages to approximate the key components of SGNNs without explicit subgraph
construction. ELPH is provably more expressive than Message Passing GNNs
(MPNNs). It outperforms existing SGNN models on many standard LP benchmarks
while being orders of magnitude faster. However, it shares the common GNN
limitation that it is only efficient when the dataset fits in GPU memory.
Accordingly, we develop a highly scalable model, called BUDDY, which uses
feature precomputation to circumvent this limitation without sacrificing
predictive performance. Our experiments show that BUDDY also outperforms SGNNs
on standard LP benchmarks while being highly scalable and faster than ELPH.Comment: 29 pages, 19 figures, 6 appendice
- …