18 research outputs found

    A survey of statistical network models

    Full text link
    Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

    Econometric analysis of heterogeneous treatment and network models

    Full text link
    My dissertation investigates commonly used testing and estimation procedures and extends these by taking into account more heterogeneity. In chapter 2, me and my co-author Andreas Dzemski provide a new overidentification test that allows for essential heterogeneity. In chapter 3, I prove weak consistency up to a measure preserving transformation for maximum-likelihood estimation of unobserved latent positions in a Euclidean space just based on observable information of the agent's linking behavior. In chapter 4, I propose a new measure of centrality which exploits the latent space structure and identifies agents who connect clusters

    Nonparametric Bayesian inference of the microcanonical stochastic block model

    Get PDF
    A principled approach to characterize the hidden structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e. the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference on large networks, but also reveal structures at multiple scales; 2. A very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges, but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.Comment: 24 pages, 9 figures, 1 table. Code is freely available as part of graph-tool at https://graph-tool.skewed.de . See also the HOWTO at https://graph-tool.skewed.de/static/doc/demos/inference/inference.html . Minor typos fixed in most recent versio

    Asymptotics and Statistical Inference in High-Dimensional Low-Rank Matrix Models

    Get PDF
    High-dimensional matrix and tensor data is ubiquitous in machine learning and statistics and often exhibits low-dimensional structure. With the rise of these types of data is the need to develop statistical inference procedures that adequately address the low-dimensional structure in a principled manner. In this dissertation we study asymptotic theory and statistical inference in structured low-rank matrix models in high-dimensional regimes where the column and row dimensions of the matrix are allowed to grow, and we consider a variety of settings for which structured low-rank matrix models manifest. Chapter 1 establishes the general framework for statistical analysis in high-dimensional low-rank matrix models, including introducing entrywise perturbation bounds, asymptotic theory, distributional theory, and statistical inference, illustrated throughout via the matrix denoising model. In Chapter 2, Chapter 3, and Chapter 4 we study the entrywise estimation of singular vectors and eigenvectors in different structured settings, with Chapter 2 considering heteroskedastic and dependent noise, Chapter 3 sparsity, and Chapter 4 additional tensor structure. In Chapter 5 we apply previous asymptotic theory to study a two-sample test for equality of distribution in network analysis, and in Chapter 6 we study a model for shared community memberships across multiple networks, and we propose and analyze a joint spectral clustering algorithm that leverages newly developed asymptotic theory for this setting. Throughout this dissertation we emphasize tools and techniques that are data-driven, nonparametric, and adaptive to signal strength, and, where applicable, noise distribution. The contents of Chapters 2-6 are based on the papers Agterberg et al. (2022b); Agterberg and Sulam (2022); Agterberg and Zhang (2022); Agterberg et al. (2020a) and Agterberg et al. (2022a) respectively, and Chapter 1 contains several novel results

    Random Walk Models, Preferential Attachment, and Sequential Monte Carlo Methods for Analysis of Network Data

    Get PDF
    Networks arise in nearly every branch of science, from biology and physics to sociology and economics. A signature of many network datasets is strong local dependence, which gives rise to phenomena such as sparsity, power law degree distributions, clustering, and structural heterogeneity. Statistical models of networks require a careful balance of flexibility to faithfully capture that dependence, and simplicity, to make analysis and inference tractable. In this dissertation, we introduce a class of models that insert one network edge at a time via a random walk, permitting the location of new edges to depend explicitly on the structure of the existing network, while remaining probabilistically and computationally tractable. Connections to graph kernels are made through the probability generating function of the random walk length distribution. The limiting degree distribution is shown to exhibit power law behavior, and the properties of the limiting degree sequence are studied analytically with martingale methods. In the second part of the dissertation, we develop a class of particle Markov chain Monte Carlo algorithms to perform inference for a large class of sequential random graph models, even when the observation consists only of a single graph. Using these methods, we derive a particle Gibbs sampler for random walk models. Fit to synthetic data, the sampler accurately recovers the model parameters; fit to real data, the model offers insight into the typical length scale of dependence in the network, and provides a new measure of vertex centrality. The arrival times of new vertices are the key to obtaining results for both theory and inference. In the third part, we undertake a careful study of the relationship between the arrival times, sparsity, and heavy tailed degree distributions in preferential attachment-type models of partitions and graphs. A number of constructive representations of the limiting degrees are obtained, and connections are made to exchangeable Gibbs partitions as well as to recent results on the limiting degrees of preferential attachment graphs
    corecore