4 research outputs found
Using Bayesian Network Representations for Effective Sampling from Generative Network Models
Bayesian networks (BNs) are used for inference and sampling by exploiting
conditional independence among random variables. Context specific independence
(CSI) is a property of graphical models where additional independence relations
arise in the context of particular values of random variables (RVs).
Identifying and exploiting CSI properties can simplify inference. Some
generative network models (models that generate social/information network
samples from a network distribution P(G)), with complex interactions among a
set of RVs, can be represented with probabilistic graphical models, in
particular with BNs. In the present work we show one such a case. We discuss
how a mixed Kronecker Product Graph Model can be represented as a BN, and study
its BN properties that can be used for efficient sampling. Specifically, we
show that instead of exhibiting CSI properties, the model has deterministic
context-specific dependence (DCSD). Exploiting this property focuses the
sampling method on a subset of the sampling space that improves efficiency
The HyperKron Graph Model for higher-order features
Graph models have long been used in lieu of real data which can be expensive
and hard to come by. A common class of models constructs a matrix of
probabilities, and samples an adjacency matrix by flipping a weighted coin for
each entry. Examples include the Erd\H{o}s-R\'{e}nyi model, Chung-Lu model, and
the Kronecker model. Here we present the HyperKron Graph model: an extension of
the Kronecker Model, but with a distribution over hyperedges. We prove that we
can efficiently generate graphs from this model in order proportional to the
number of edges times a small log-factor, and find that in practice the runtime
is linear with respect to the number of edges. We illustrate a number of useful
features of the HyperKron model including non-trivial clustering and highly
skewed degree distributions. Finally, we fit the HyperKron model to real-world
networks, and demonstrate the model's flexibility with a complex application of
the HyperKron model to networks with coherent feed-forward loops.Comment: 17 pages, 9 figure
Coin-flipping, ball-dropping, and grass-hopping for generating random graphs from matrices of edge probabilities
Common models for random graphs, such as Erd\H{o}s-R\'{e}nyi and Kronecker
graphs, correspond to generating random adjacency matrices where each entry is
non-zero based on a large matrix of probabilities. Generating an instance of a
random graph based on these models is easy, although inefficient, by flipping
biased coins (i.e. sampling binomial random variables) for each possible edge.
This process is inefficient because most large graph models correspond to
sparse graphs where the vast majority of coin flips will result in no edges. We
describe some not-entirely-well-known, but not-entirely-unknown, techniques
that will enable us to sample a graph by finding only the coin flips that will
produce edges. Our analogies for these procedures are ball-dropping, which is
easier to implement, but may need extra work due to duplicate edges, and
grass-hopping, which results in no duplicated work or extra edges.
Grass-hopping does this using geometric random variables. In order to use
this idea on complex probability matrices such as those in Kronecker graphs, we
decompose the problem into three steps, each of which are independently useful
computational primitives: (i) enumerating non-decreasing sequences, (ii)
unranking multiset permutations, and (iii) decoding and encoding z-curve and
Morton codes and permutations. The third step is the result of a new connection
between repeated Kronecker product operations and Morton codes. Throughout, we
draw connections to ideas underlying applied math and computer science
including coupon collector problems.Comment: 43 pages, 16 problem
Recent Advances in Scalable Network Generation
Random graph models are frequently used as a controllable and versatile data
source for experimental campaigns in various research fields. Generating such
data-sets at scale is a non-trivial task as it requires design decisions
typically spanning multiple areas of expertise. Challenges begin with the
identification of relevant domain-specific network features, continue with the
question of how to compile such features into a tractable model, and culminate
in algorithmic details arising while implementing the pertaining model.
In the present survey, we explore crucial aspects of random graph models with
known scalable generators. We begin by briefly introducing network features
considered by such models, and then discuss random graphs alongside with
generation algorithms. Our focus lies on modelling techniques and algorithmic
primitives that have proven successful in obtaining massive graphs. We consider
concepts and graph models for various domains (such as social network,
infrastructure, ecology, and numerical simulations), and discuss generators for
different models of computation (including shared-memory parallelism,
massive-parallel GPUs, and distributed systems)