53,494 research outputs found
2.5K-Graphs: from Sampling to Generation
Understanding network structure and having access to realistic graphs plays a
central role in computer and social networks research. In this paper, we
propose a complete, and practical methodology for generating graphs that
resemble a real graph of interest. The metrics of the original topology we
target to match are the joint degree distribution (JDD) and the
degree-dependent average clustering coefficient (). We start by
developing efficient estimators for these two metrics based on a node sample
collected via either independence sampling or random walks. Then, we process
the output of the estimators to ensure that the target properties are
realizable. Finally, we propose an efficient algorithm for generating
topologies that have the exact target JDD and a close to the
target. Extensive simulations using real-life graphs show that the graphs
generated by our methodology are similar to the original graph with respect to,
not only the two target metrics, but also a wide range of other topological
metrics; furthermore, our generator is order of magnitudes faster than
state-of-the-art techniques
Multifractal Network Generator
We introduce a new approach to constructing networks with realistic features.
Our method, in spite of its conceptual simplicity (it has only two parameters)
is capable of generating a wide variety of network types with prescribed
statistical properties, e.g., with degree- or clustering coefficient
distributions of various, very different forms. In turn, these graphs can be
used to test hypotheses, or, as models of actual data. The method is based on a
mapping between suitably chosen singular measures defined on the unit square
and sparse infinite networks. Such a mapping has the great potential of
allowing for graph theoretical results for a variety of network topologies. The
main idea of our approach is to go to the infinite limit of the singular
measure and the size of the corresponding graph simultaneously. A very unique
feature of this construction is that the complexity of the generated network is
increasing with the size. We present analytic expressions derived from the
parameters of the -- to be iterated-- initial generating measure for such major
characteristics of graphs as their degree, clustering coefficient and
assortativity coefficient distributions. The optimal parameters of the
generating measure are determined from a simple simulated annealing process.
Thus, the present work provides a tool for researchers from a variety of fields
(such as biology, computer science, biology, or complex systems) enabling them
to create a versatile model of their network data.Comment: Preprint. Final version appeared in PNAS
Spectral Graph Forge: Graph Generation Targeting Modularity
Community structure is an important property that captures inhomogeneities
common in large networks, and modularity is one of the most widely used metrics
for such community structure. In this paper, we introduce a principled
methodology, the Spectral Graph Forge, for generating random graphs that
preserves community structure from a real network of interest, in terms of
modularity. Our approach leverages the fact that the spectral structure of
matrix representations of a graph encodes global information about community
structure. The Spectral Graph Forge uses a low-rank approximation of the
modularity matrix to generate synthetic graphs that match a target modularity
within user-selectable degree of accuracy, while allowing other aspects of
structure to vary. We show that the Spectral Graph Forge outperforms
state-of-the-art techniques in terms of accuracy in targeting the modularity
and randomness of the realizations, while also preserving other local
structural properties and node attributes. We discuss extensions of the
Spectral Graph Forge to target other properties beyond modularity, and its
applications to anonymization
Systematic Topology Analysis and Generation Using Degree Correlations
We present a new, systematic approach for analyzing network topologies. We
first introduce the dK-series of probability distributions specifying all
degree correlations within d-sized subgraphs of a given graph G. Increasing
values of d capture progressively more properties of G at the cost of more
complex representation of the probability distribution. Using this series, we
can quantitatively measure the distance between two graphs and construct random
graphs that accurately reproduce virtually all metrics proposed in the
literature. The nature of the dK-series implies that it will also capture any
future metrics that may be proposed. Using our approach, we construct graphs
for d=0,1,2,3 and demonstrate that these graphs reproduce, with increasing
accuracy, important properties of measured and modeled Internet topologies. We
find that the d=2 case is sufficient for most practical purposes, while d=3
essentially reconstructs the Internet AS- and router-level topologies exactly.
We hope that a systematic method to analyze and synthesize topologies offers a
significant improvement to the set of tools available to network topology and
protocol researchers.Comment: Final versio
Graph realizations constrained by skeleton graphs
In 2008 Amanatidis, Green and Mihail introduced the Joint Degree Matrix (JDM)
model to capture the fundamental difference in assortativity of networks in
nature studied by the physical and life sciences and social networks studied in
the social sciences. In 2014 Czabarka proposed a direct generalization of the
JDM model, the Partition Adjacency Matrix (PAM) model. In the PAM model the
vertices have specified degrees, and the vertex set itself is partitioned into
classes. For each pair of vertex classes the number of edges between the
classes in a graph realization is prescribed. In this paper we apply the new
{\em skeleton graph} model to describe the same information as the PAM model.
Our model is more convenient for handling problems with low number of partition
classes or with special topological restrictions among the classes. We
investigate two particular cases in detail: (i) when there are only two vertex
classes and (ii) when the skeleton graph contains at most one cycle.Comment: 19 page
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Inference via low-dimensional couplings
We investigate the low-dimensional structure of deterministic transformations
between random variables, i.e., transport maps between probability measures. In
the context of statistics and machine learning, these transformations can be
used to couple a tractable "reference" measure (e.g., a standard Gaussian) with
a target measure of interest. Direct simulation from the desired measure can
then be achieved by pushing forward reference samples through the map. Yet
characterizing such a map---e.g., representing and evaluating it---grows
challenging in high dimensions. The central contribution of this paper is to
establish a link between the Markov properties of the target measure and the
existence of low-dimensional couplings, induced by transport maps that are
sparse and/or decomposable. Our analysis not only facilitates the construction
of transformations in high-dimensional settings, but also suggests new
inference methodologies for continuous non-Gaussian graphical models. For
instance, in the context of nonlinear state-space models, we describe new
variational algorithms for filtering, smoothing, and sequential parameter
inference. These algorithms can be understood as the natural
generalization---to the non-Gaussian case---of the square-root
Rauch-Tung-Striebel Gaussian smoother.Comment: 78 pages, 25 figure
- …