18 research outputs found
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
Econometric analysis of heterogeneous treatment and network models
My dissertation investigates commonly used testing and estimation procedures and extends
these by taking into account more heterogeneity. In chapter 2, me and my co-author Andreas
Dzemski provide a new overidentification test that allows for essential heterogeneity. In chapter
3, I prove weak consistency up to a measure preserving transformation for maximum-likelihood
estimation of unobserved latent positions in a Euclidean space just based on observable information
of the agent's linking behavior. In chapter 4, I propose a new measure of centrality
which exploits the latent space structure and identifies agents who connect clusters
Nonparametric Bayesian inference of the microcanonical stochastic block model
A principled approach to characterize the hidden structure of networks is to
formulate generative models, and then infer their parameters from data. When
the desired structure is composed of modules or "communities", a suitable
choice for this task is the stochastic block model (SBM), where nodes are
divided into groups, and the placement of edges is conditioned on the group
memberships. Here, we present a nonparametric Bayesian method to infer the
modular structure of empirical networks, including the number of modules and
their hierarchical organization. We focus on a microcanonical variant of the
SBM, where the structure is imposed via hard constraints, i.e. the generated
networks are not allowed to violate the patterns imposed by the model. We show
how this simple model variation allows simultaneously for two important
improvements over more traditional inference approaches: 1. Deeper Bayesian
hierarchies, with noninformative priors replaced by sequences of priors and
hyperpriors, that not only remove limitations that seriously degrade the
inference on large networks, but also reveal structures at multiple scales; 2.
A very efficient inference algorithm that scales well not only for networks
with a large number of nodes and edges, but also with an unlimited number of
modules. We show also how this approach can be used to sample modular
hierarchies from the posterior distribution, as well as to perform model
selection. We discuss and analyze the differences between sampling from the
posterior and simply finding the single parameter estimate that maximizes it.
Furthermore, we expose a direct equivalence between our microcanonical approach
and alternative derivations based on the canonical SBM.Comment: 24 pages, 9 figures, 1 table. Code is freely available as part of
graph-tool at https://graph-tool.skewed.de . See also the HOWTO at
https://graph-tool.skewed.de/static/doc/demos/inference/inference.html .
Minor typos fixed in most recent versio
Asymptotics and Statistical Inference in High-Dimensional Low-Rank Matrix Models
High-dimensional matrix and tensor data is ubiquitous in machine learning and statistics
and often exhibits low-dimensional structure. With the rise of these types of data is the need to develop statistical inference procedures that adequately address the low-dimensional structure in a principled manner. In this dissertation we study asymptotic theory and statistical inference in structured low-rank matrix models in high-dimensional regimes where the column and row dimensions of the matrix are allowed to grow, and we consider a variety of settings for which structured low-rank matrix models manifest.
Chapter 1 establishes the general framework for statistical analysis in high-dimensional low-rank matrix models, including introducing entrywise perturbation bounds, asymptotic theory, distributional theory, and statistical inference, illustrated throughout via the matrix denoising model. In Chapter 2, Chapter 3, and Chapter 4 we study the entrywise estimation of singular vectors and eigenvectors in different structured settings, with Chapter 2 considering heteroskedastic and dependent noise, Chapter 3 sparsity, and Chapter 4 additional tensor structure. In Chapter 5 we apply previous asymptotic theory to study a two-sample
test for equality of distribution in network analysis, and in Chapter 6 we study a model for shared community memberships across multiple networks, and we propose and analyze a joint spectral clustering algorithm that leverages newly developed asymptotic theory for this setting.
Throughout this dissertation we emphasize tools and techniques that are data-driven, nonparametric, and adaptive to signal strength, and, where applicable, noise distribution. The contents of Chapters 2-6 are based on the papers Agterberg et al. (2022b); Agterberg and Sulam (2022); Agterberg and Zhang (2022); Agterberg et al. (2020a) and Agterberg et al. (2022a) respectively, and Chapter 1 contains several novel results
Random Walk Models, Preferential Attachment, and Sequential Monte Carlo Methods for Analysis of Network Data
Networks arise in nearly every branch of science, from biology and physics to sociology and economics. A signature of many network datasets is strong local dependence, which gives rise to phenomena such as sparsity, power law degree distributions, clustering, and structural heterogeneity. Statistical models of networks require a careful balance of flexibility to faithfully capture that dependence, and simplicity, to make analysis and inference tractable. In this dissertation, we introduce a class of models that insert one network edge at a time via a random walk, permitting the location of new edges to depend explicitly on the structure of the existing network, while remaining probabilistically and computationally tractable. Connections to graph kernels are made through the probability generating function of the random walk length distribution. The limiting degree distribution is shown to exhibit power law behavior, and the properties of the limiting degree sequence are studied analytically with martingale methods. In the second part of the dissertation, we develop a class of particle Markov chain Monte Carlo algorithms to perform inference for a large class of sequential random graph models, even when the observation consists only of a single graph. Using these methods, we derive a particle Gibbs sampler for random walk models. Fit to synthetic data, the sampler accurately recovers the model parameters; fit to real data, the model offers insight into the typical length scale of dependence in the network, and provides a new measure of vertex centrality.
The arrival times of new vertices are the key to obtaining results for both theory and inference. In the third part, we undertake a careful study of the relationship between the arrival times, sparsity, and heavy tailed degree distributions in preferential attachment-type models of partitions and graphs. A number of constructive representations of the limiting degrees are obtained, and connections are made to exchangeable Gibbs partitions as well as to recent results on the limiting degrees of preferential attachment graphs