556 research outputs found
Metrics for Graph Comparison: A Practitioner's Guide
Comparison of graph structure is a ubiquitous task in data analysis and
machine learning, with diverse applications in fields such as neuroscience,
cyber security, social network analysis, and bioinformatics, among others.
Discovery and comparison of structures such as modular communities, rich clubs,
hubs, and trees in data in these fields yields insight into the generative
mechanisms and functional properties of the graph.
Often, two graphs are compared via a pairwise distance measure, with a small
distance indicating structural similarity and vice versa. Common choices
include spectral distances (also known as distances) and distances
based on node affinities. However, there has of yet been no comparative study
of the efficacy of these distance measures in discerning between common graph
topologies and different structural scales.
In this work, we compare commonly used graph metrics and distance measures,
and demonstrate their ability to discern between common topological features
found in both random graph models and empirical datasets. We put forward a
multi-scale picture of graph structure, in which the effect of global and local
structure upon the distance measures is considered. We make recommendations on
the applicability of different distance measures to empirical graph data
problem based on this multi-scale view. Finally, we introduce the Python
library NetComp which implements the graph distances used in this work
Kronecker Graphs: An Approach to Modeling Networks
How can we model networks with a mathematically tractable model that allows
for rigorous analysis of network properties? Networks exhibit a long list of
surprising properties: heavy tails for the degree distribution; small
diameters; and densification and shrinking diameters over time. Most present
network models either fail to match several of the above properties, are
complicated to analyze mathematically, or both. In this paper we propose a
generative model for networks that is both mathematically tractable and can
generate networks that have the above mentioned properties. Our main idea is to
use the Kronecker product to generate graphs that we refer to as "Kronecker
graphs".
First, we prove that Kronecker graphs naturally obey common network
properties. We also provide empirical evidence showing that Kronecker graphs
can effectively model the structure of real networks.
We then present KronFit, a fast and scalable algorithm for fitting the
Kronecker graph generation model to large real networks. A naive approach to
fitting would take super- exponential time. In contrast, KronFit takes linear
time, by exploiting the structure of Kronecker matrix multiplication and by
using statistical simulation techniques.
Experiments on large real and synthetic networks show that KronFit finds
accurate parameters that indeed very well mimic the properties of target
networks. Once fitted, the model parameters can be used to gain insights about
the network structure, and the resulting synthetic graphs can be used for null-
models, anonymization, extrapolations, and graph summarization
Topological characteristics of IP networks
Topological analysis of the Internet is needed for developments on network planning, optimal routing
algorithms, failure detection measures, and understanding business models. Accurate measurement, inference and modelling techniques are fundamental to Internet topology research. A requirement towards
achieving such goals is the measurements of network topologies at different levels of granularity. In this
work, I start by studying techniques for inferring, modelling, and generating Internet topologies at both
the router and administrative levels. I also compare the mathematical models that are used to characterise
various topologies and the generation tools based on them.
Many topological models have been proposed to generate Internet Autonomous System(AS) topologies. I use an extensive set of measures and innovative methodologies to compare AS topology generation models with several observed AS topologies. This analysis shows that the existing AS topology
generation models fail to capture important characteristics, such as the complexity of the local interconnection structure between ASes. Furthermore, I use routing data from multiple vantage points to show
that using additional measurement points significantly affect our observations about local structural properties, such as clustering and node centrality. Degree-based properties, however, are not notably affected
by additional measurements locations. The shortcomings of AS topology generation models stems from
an underestimation of the complexity of the connectivity in the Internet and biases of measurement techniques.
An increasing number of synthetic topology generators are available, each claiming to produce
representative Internet topologies. Every generator has its own parameters, allowing the user to generate
topologies with different characteristics. However, there exist no clear guidelines on tuning the value of
these parameters in order to obtain a topology with specific characteristics. I propose a method which
allows optimal parameters of a model to be estimated for a given target topology. The optimisation
is performed using the weighted spectral distribution metric, which simultaneously takes into account
many the properties of a graph.
In order to understand the dynamics of the Internet, I study the evolution of the AS topology over a
period of seven years. To understand the structural changes in the topology, I use the weighted spectral
distribution as this metric reveals differences in the hierarchical structure of two graphs. The results indicate that the Internet is changing from a strongly customer-provider oriented, disassortative network, to
a soft-hierarchical, peering-oriented, assortative network. This change is indicative of evolving business
relationships amongst organisations
Randomness and Complexity in Networks
I start by reviewing some basic properties of random graphs. I then consider
the role of random walks in complex networks and show how they may be used to
explain why so many long tailed distributions are found in real data sets. The
key idea is that in many cases the process involves copying of properties of
near neighbours in the network and this is a type of short random walk which in
turn produce a natural preferential attachment mechanism. Applying this to
networks of fixed size I show that copying and innovation are processes with
special mathematical properties which include the ability to solve a simple
model exactly for any parameter values and at any time. I finish by looking at
variations of this basic model.Comment: Survey paper based on talk given at the workshop on ``Stochastic
Networks and Internet Technology'', Centro di Ricerca Matematica Ennio De
Giorgi, Matematica nelle Scienze Naturali e Sociali, Pisa, 17th - 21st
September 2007. To appear in proceeding
Spectral Estimation of Conditional Random Graph Models for Large-Scale Network Data
Generative models for graphs have been typically committed to strong prior
assumptions concerning the form of the modeled distributions. Moreover, the
vast majority of currently available models are either only suitable for
characterizing some particular network properties (such as degree distribution
or clustering coefficient), or they are aimed at estimating joint probability
distributions, which is often intractable in large-scale networks. In this
paper, we first propose a novel network statistic, based on the Laplacian
spectrum of graphs, which allows to dispense with any parametric assumption
concerning the modeled network properties. Second, we use the defined statistic
to develop the Fiedler random graph model, switching the focus from the
estimation of joint probability distributions to a more tractable conditional
estimation setting. After analyzing the dependence structure characterizing
Fiedler random graphs, we evaluate them experimentally in edge prediction over
several real-world networks, showing that they allow to reach a much higher
prediction accuracy than various alternative statistical models.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty
in Artificial Intelligence (UAI2012
Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling
International audienceStatistical models for networks have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are explicitly designed for capturing some specific graph properties (such as power-law degree distributions), which makes them unsuitable for application to domains where the behavior of the target quantities is not known a priori. The key contribution of this paper is twofold. First, we introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random field model, which allows for efficient estimation of edge distributions over large-scale random networks. After analyzing the dependence structure involved in Fiedler random fields, we estimate them over several real-world networks, showing that they achieve a much higher modeling accuracy than other well-known statistical approaches
Spectral analysis for stochastic models of large-scale complex dynamical networks
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 179-196).Research on large-scale complex networks has important applications in diverse systems of current interest, including the Internet, the World-Wide Web, social, biological, and chemical networks. The growing availability of massive databases, computing facilities, and reliable data analysis tools has provided a powerful framework to explore structural properties of such real-world networks. However, one cannot efficiently retrieve and store the exact or full topology for many large-scale networks. As an alternative, several stochastic network models have been proposed that attempt to capture essential characteristics of such complex topologies. Network researchers then use these stochastic models to generate topologies similar to the complex network of interest and use these topologies to test, for example, the behavior of dynamical processes in the network. In general, the topological properties of a network are not directly evident in the behavior of dynamical processes running on it. On the other hand, the eigenvalue spectra of certain matricial representations of the network topology do relate quite directly to the behavior of many dynamical processes of interest, such as random walks, Markov processes, virus/rumor spreading, or synchronization of oscillators in a network. This thesis studies spectral properties of popular stochastic network models proposed in recent years. In particular, we develop several methods to determine or estimate the spectral moments of these models. We also present a variety of techniques to extract relevant spectral information from a finite sequence of spectral moments. A range of numerical examples throughout the thesis confirms the efficacy of our approach. Our ultimate objective is to use such results to understand and predict the behavior of dynamical processes taking place in large-scale networks.by VĂctor Manuel Preciado.Ph.D
- …