4,669 research outputs found
A Scalable Null Model for Directed Graphs Matching All Degree Distributions: In, Out, and Reciprocal
Degree distributions are arguably the most important property of real world
networks. The classic edge configuration model or Chung-Lu model can generate
an undirected graph with any desired degree distribution. This serves as a good
null model to compare algorithms or perform experimental studies. Furthermore,
there are scalable algorithms that implement these models and they are
invaluable in the study of graphs. However, networks in the real-world are
often directed, and have a significant proportion of reciprocal edges. A
stronger relation exists between two nodes when they each point to one another
(reciprocal edge) as compared to when only one points to the other (one-way
edge). Despite their importance, reciprocal edges have been disregarded by most
directed graph models.
We propose a null model for directed graphs inspired by the Chung-Lu model
that matches the in-, out-, and reciprocal-degree distributions of the real
graphs. Our algorithm is scalable and requires random numbers to
generate a graph with edges. We perform a series of experiments on real
datasets and compare with existing graph models.Comment: Camera ready version for IEEE Workshop on Network Science; fixed some
typos in tabl
Gravity-Inspired Graph Autoencoders for Directed Link Prediction
Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged
as powerful node embedding methods. In particular, graph AE and VAE were
successfully leveraged to tackle the challenging link prediction problem,
aiming at figuring out whether some pairs of nodes from a graph are connected
by unobserved edges. However, these models focus on undirected graphs and
therefore ignore the potential direction of the link, which is limiting for
numerous real-life applications. In this paper, we extend the graph AE and VAE
frameworks to address link prediction in directed graphs. We present a new
gravity-inspired decoder scheme that can effectively reconstruct directed
graphs from a node embedding. We empirically evaluate our method on three
different directed link prediction tasks, for which standard graph AE and VAE
perform poorly. We achieve competitive results on three real-world graphs,
outperforming several popular baselines.Comment: ACM International Conference on Information and Knowledge Management
(CIKM 2019
Detecting Cohesive and 2-mode Communities in Directed and Undirected Networks
Networks are a general language for representing relational information among
objects. An effective way to model, reason about, and summarize networks, is to
discover sets of nodes with common connectivity patterns. Such sets are
commonly referred to as network communities. Research on network community
detection has predominantly focused on identifying communities of densely
connected nodes in undirected networks.
In this paper we develop a novel overlapping community detection method that
scales to networks of millions of nodes and edges and advances research along
two dimensions: the connectivity structure of communities, and the use of edge
directedness for community detection. First, we extend traditional definitions
of network communities by building on the observation that nodes can be densely
interlinked in two different ways: In cohesive communities nodes link to each
other, while in 2-mode communities nodes link in a bipartite fashion, where
links predominate between the two partitions rather than inside them. Our
method successfully detects both 2-mode as well as cohesive communities, that
may also overlap or be hierarchically nested. Second, while most existing
community detection methods treat directed edges as though they were
undirected, our method accounts for edge directions and is able to identify
novel and meaningful community structures in both directed and undirected
networks, using data from social, biological, and ecological domains.Comment: Published in the proceedings of WSDM '1
Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks
Heterogeneous information networks (HINs) are ubiquitous in real-world
applications. In the meantime, network embedding has emerged as a convenient
tool to mine and learn from networked data. As a result, it is of interest to
develop HIN embedding methods. However, the heterogeneity in HINs introduces
not only rich information but also potentially incompatible semantics, which
poses special challenges to embedding learning in HINs. With the intention to
preserve the rich yet potentially incompatible information in HIN embedding, we
propose to study the problem of comprehensive transcription of heterogeneous
information networks. The comprehensive transcription of HINs also provides an
easy-to-use approach to unleash the power of HINs, since it requires no
additional supervision, expertise, or feature engineering. To cope with the
challenges in the comprehensive transcription of HINs, we propose the HEER
algorithm, which embeds HINs via edge representations that are further coupled
with properly-learned heterogeneous metrics. To corroborate the efficacy of
HEER, we conducted experiments on two large-scale real-words datasets with an
edge reconstruction task and multiple case studies. Experiment results
demonstrate the effectiveness of the proposed HEER model and the utility of
edge representations and heterogeneous metrics. The code and data are available
at https://github.com/GentleZhu/HEER.Comment: 10 pages. In Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, London, United Kingdom,
ACM, 201
On Large-Scale Graph Generation with Validation of Diverse Triangle Statistics at Edges and Vertices
Researchers developing implementations of distributed graph analytic
algorithms require graph generators that yield graphs sharing the challenging
characteristics of real-world graphs (small-world, scale-free, heavy-tailed
degree distribution) with efficiently calculable ground-truth solutions to the
desired output. Reproducibility for current generators used in benchmarking are
somewhat lacking in this respect due to their randomness: the output of a
desired graph analytic can only be compared to expected values and not exact
ground truth. Nonstochastic Kronecker product graphs meet these design criteria
for several graph analytics. Here we show that many flavors of triangle
participation can be cheaply calculated while generating a Kronecker product
graph. Given two medium-sized scale-free graphs with adjacency matrices and
, their Kronecker product graph has adjacency matrix . Such
graphs are highly compressible: edges are represented in memory and can be built in a distributed setting from
small data structures, making them easy to share in compressed form. Many
interesting graph calculations have worst-case complexity bounds and often these are reduced to
for Kronecker product graphs, when a Kronecker formula can be derived yielding
the sought calculation on in terms of related calculations on and .
We focus on deriving formulas for triangle participation at vertices, , a vector storing the number of triangles that every vertex is involved
in, and triangle participation at edges, , a sparse matrix storing
the number of triangles at every edge.Comment: 10 pages, 7 figures, IEEE IPDPS Graph Algorithms Building Block
Online Reciprocal Recommendation with Theoretical Performance Guarantees
A reciprocal recommendation problem is one where the goal of learning is not
just to predict a user's preference towards a passive item (e.g., a book), but
to recommend the targeted user on one side another user from the other side
such that a mutual interest between the two exists. The problem thus is sharply
different from the more traditional items-to-users recommendation, since a good
match requires meeting the preferences of both users. We initiate a rigorous
theoretical investigation of the reciprocal recommendation task in a specific
framework of sequential learning. We point out general limitations, formulate
reasonable assumptions enabling effective learning and, under these
assumptions, we design and analyze a computationally efficient algorithm that
uncovers mutual likes at a pace comparable to those achieved by a clearvoyant
algorithm knowing all user preferences in advance. Finally, we validate our
algorithm against synthetic and real-world datasets, showing improved empirical
performance over simple baselines
- …