556 research outputs found

    Metrics for Graph Comparison: A Practitioner's Guide

    Full text link
    Comparison of graph structure is a ubiquitous task in data analysis and machine learning, with diverse applications in fields such as neuroscience, cyber security, social network analysis, and bioinformatics, among others. Discovery and comparison of structures such as modular communities, rich clubs, hubs, and trees in data in these fields yields insight into the generative mechanisms and functional properties of the graph. Often, two graphs are compared via a pairwise distance measure, with a small distance indicating structural similarity and vice versa. Common choices include spectral distances (also known as λ\lambda distances) and distances based on node affinities. However, there has of yet been no comparative study of the efficacy of these distance measures in discerning between common graph topologies and different structural scales. In this work, we compare commonly used graph metrics and distance measures, and demonstrate their ability to discern between common topological features found in both random graph models and empirical datasets. We put forward a multi-scale picture of graph structure, in which the effect of global and local structure upon the distance measures is considered. We make recommendations on the applicability of different distance measures to empirical graph data problem based on this multi-scale view. Finally, we introduce the Python library NetComp which implements the graph distances used in this work

    Kronecker Graphs: An Approach to Modeling Networks

    Full text link
    How can we model networks with a mathematically tractable model that allows for rigorous analysis of network properties? Networks exhibit a long list of surprising properties: heavy tails for the degree distribution; small diameters; and densification and shrinking diameters over time. Most present network models either fail to match several of the above properties, are complicated to analyze mathematically, or both. In this paper we propose a generative model for networks that is both mathematically tractable and can generate networks that have the above mentioned properties. Our main idea is to use the Kronecker product to generate graphs that we refer to as "Kronecker graphs". First, we prove that Kronecker graphs naturally obey common network properties. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present KronFit, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super- exponential time. In contrast, KronFit takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on large real and synthetic networks show that KronFit finds accurate parameters that indeed very well mimic the properties of target networks. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synthetic graphs can be used for null- models, anonymization, extrapolations, and graph summarization

    Topological characteristics of IP networks

    Get PDF
    Topological analysis of the Internet is needed for developments on network planning, optimal routing algorithms, failure detection measures, and understanding business models. Accurate measurement, inference and modelling techniques are fundamental to Internet topology research. A requirement towards achieving such goals is the measurements of network topologies at different levels of granularity. In this work, I start by studying techniques for inferring, modelling, and generating Internet topologies at both the router and administrative levels. I also compare the mathematical models that are used to characterise various topologies and the generation tools based on them. Many topological models have been proposed to generate Internet Autonomous System(AS) topologies. I use an extensive set of measures and innovative methodologies to compare AS topology generation models with several observed AS topologies. This analysis shows that the existing AS topology generation models fail to capture important characteristics, such as the complexity of the local interconnection structure between ASes. Furthermore, I use routing data from multiple vantage points to show that using additional measurement points significantly affect our observations about local structural properties, such as clustering and node centrality. Degree-based properties, however, are not notably affected by additional measurements locations. The shortcomings of AS topology generation models stems from an underestimation of the complexity of the connectivity in the Internet and biases of measurement techniques. An increasing number of synthetic topology generators are available, each claiming to produce representative Internet topologies. Every generator has its own parameters, allowing the user to generate topologies with different characteristics. However, there exist no clear guidelines on tuning the value of these parameters in order to obtain a topology with specific characteristics. I propose a method which allows optimal parameters of a model to be estimated for a given target topology. The optimisation is performed using the weighted spectral distribution metric, which simultaneously takes into account many the properties of a graph. In order to understand the dynamics of the Internet, I study the evolution of the AS topology over a period of seven years. To understand the structural changes in the topology, I use the weighted spectral distribution as this metric reveals differences in the hierarchical structure of two graphs. The results indicate that the Internet is changing from a strongly customer-provider oriented, disassortative network, to a soft-hierarchical, peering-oriented, assortative network. This change is indicative of evolving business relationships amongst organisations

    Randomness and Complexity in Networks

    Full text link
    I start by reviewing some basic properties of random graphs. I then consider the role of random walks in complex networks and show how they may be used to explain why so many long tailed distributions are found in real data sets. The key idea is that in many cases the process involves copying of properties of near neighbours in the network and this is a type of short random walk which in turn produce a natural preferential attachment mechanism. Applying this to networks of fixed size I show that copying and innovation are processes with special mathematical properties which include the ability to solve a simple model exactly for any parameter values and at any time. I finish by looking at variations of this basic model.Comment: Survey paper based on talk given at the workshop on ``Stochastic Networks and Internet Technology'', Centro di Ricerca Matematica Ennio De Giorgi, Matematica nelle Scienze Naturali e Sociali, Pisa, 17th - 21st September 2007. To appear in proceeding

    Spectral Estimation of Conditional Random Graph Models for Large-Scale Network Data

    Get PDF
    Generative models for graphs have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are either only suitable for characterizing some particular network properties (such as degree distribution or clustering coefficient), or they are aimed at estimating joint probability distributions, which is often intractable in large-scale networks. In this paper, we first propose a novel network statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random graph model, switching the focus from the estimation of joint probability distributions to a more tractable conditional estimation setting. After analyzing the dependence structure characterizing Fiedler random graphs, we evaluate them experimentally in edge prediction over several real-world networks, showing that they allow to reach a much higher prediction accuracy than various alternative statistical models.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012

    Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

    Get PDF
    International audienceStatistical models for networks have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are explicitly designed for capturing some specific graph properties (such as power-law degree distributions), which makes them unsuitable for application to domains where the behavior of the target quantities is not known a priori. The key contribution of this paper is twofold. First, we introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random field model, which allows for efficient estimation of edge distributions over large-scale random networks. After analyzing the dependence structure involved in Fiedler random fields, we estimate them over several real-world networks, showing that they achieve a much higher modeling accuracy than other well-known statistical approaches

    Spectral analysis for stochastic models of large-scale complex dynamical networks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 179-196).Research on large-scale complex networks has important applications in diverse systems of current interest, including the Internet, the World-Wide Web, social, biological, and chemical networks. The growing availability of massive databases, computing facilities, and reliable data analysis tools has provided a powerful framework to explore structural properties of such real-world networks. However, one cannot efficiently retrieve and store the exact or full topology for many large-scale networks. As an alternative, several stochastic network models have been proposed that attempt to capture essential characteristics of such complex topologies. Network researchers then use these stochastic models to generate topologies similar to the complex network of interest and use these topologies to test, for example, the behavior of dynamical processes in the network. In general, the topological properties of a network are not directly evident in the behavior of dynamical processes running on it. On the other hand, the eigenvalue spectra of certain matricial representations of the network topology do relate quite directly to the behavior of many dynamical processes of interest, such as random walks, Markov processes, virus/rumor spreading, or synchronization of oscillators in a network. This thesis studies spectral properties of popular stochastic network models proposed in recent years. In particular, we develop several methods to determine or estimate the spectral moments of these models. We also present a variety of techniques to extract relevant spectral information from a finite sequence of spectral moments. A range of numerical examples throughout the thesis confirms the efficacy of our approach. Our ultimate objective is to use such results to understand and predict the behavior of dynamical processes taking place in large-scale networks.by VĂ­ctor Manuel Preciado.Ph.D
    • …
    corecore