12 research outputs found

    Smoothing graphons for modelling exchangeable relational data

    Full text link
    Modelling exchangeable relational data can be described appropriately in graphon theory. Most Bayesian methods for modelling exchangeable relational data can be attributed to this framework by exploiting different forms of graphons. However, the graphons adopted by existing Bayesian methods are either piecewise-constant functions, which are insufficiently flexible for accurate modelling of the relational data, or are complicated continuous functions, which incur heavy computational costs for inference. In this work, we overcome these two shortcomings by smoothing piecewise-constant graphons, which permits continuous intensity values for describing relations, without impractically increasing computational costs. In particular, we focus on the Bayesian Stochastic Block Model (SBM) and demonstrate how to adapt the piecewise-constant SBM graphon to the smoothed version. We first propose the Integrated Smoothing Graphon (ISG) which introduces one smoothing parameter to the SBM graphon to generate continuous relational intensity values. Then, we further develop the Latent Feature Smoothing Graphon (LFSG), which improves the ISG, by introducing auxiliary hidden labels to decompose the calculation of the ISG intensity and enable efficient inference. Experimental results on real-world data sets validate the advantages of applying smoothing strategies to the Stochastic Block Model, demonstrating that smoothing graphons can greatly improve AUC and precision for link prediction without increasing computational complexity

    The Graph Pencil Method: Mapping Subgraph Densities to Stochastic Block Models

    Get PDF
    In this work, we describe a method that determines an exact map from a finite set of subgraph densities to the parameters of a stochastic block model (SBM) matching these densities. Given a number K of blocks, the subgraph densities of a finite number of stars and bistars uniquely determines a single element of the class of all degree-separated stochastic block models with K blocks. Our method makes it possible to translate estimates of these subgraph densities into model parameters, and hence to use subgraph densities directly for inference. The computational overhead is negligible; computing the translation map is polynomial in K, but independent of the graph size once the subgraph densities are given

    Statistical Network Analysis: Beyond Block Models.

    Full text link
    Network data represent​ ​ connections between units of analysis and lead to many interesting research questions​ with diverse applications​. In this thesis, we focus on inferring the structure underlying an observed network, which can be thought of as a noisy random realization of the unobserved true structure. ​Different applications focus on different types of underlying structure; one question of broad interest is finding a community structure, with communities typically defined as groups of nodes that share similar connectivity patterns. ​One common and widely used model for describing​ a community structure​ in a network is the stochastic block model. This model has attracted a lot of attention because of its tractable theoretical properties, but it is also well known to oversimplify the structure observed in real world networks and often does not fit the data well. Thus there has been a recent push to expand the stochastic block model in various ways to make it closer to what we observe in the real world, and this thesis makes several contributions to this effort.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133476/1/yzhanghf_1.pd

    Differential Privacy, Property Testing, and Perturbations

    Full text link
    Controlling the dissemination of information about ourselves has become a minefield in the modern age. We release data about ourselves every day and don’t always fully understand what information is contained in this data. It is often the case that the combination of seemingly innocuous pieces of data can be combined to reveal more sensitive information about ourselves than we intended. Differential privacy has developed as a technique to prevent this type of privacy leakage. It borrows ideas from information theory to inject enough uncertainty into the data so that sensitive information is provably absent from the privatised data. Current research in differential privacy walks the fine line between removing sensitive information while allowing non-sensitive information to be released. At its heart, this thesis is about the study of information. Many of the results can be formulated as asking a subset of the questions: does the data you have contain enough information to learn what you would like to learn? and how can I affect the data to ensure you can’t discern sensitive information? We will often approach the former question from both directions: information theoretic lower bounds on recovery and algorithmic upper bounds. We begin with an information theoretic lower bound for graphon estimation. This explores the fundamental limits of how much information about the underlying population is contained in a finite sample of data. We then move on to exploring the connection between information theoretic results and privacy in the context of linear inverse problems. We find that there is a discrepancy between how the inverse problems community and the privacy community view good recovery of information. Next, we explore black-box testing for privacy. We argue that the amount of information required to verify the privacy guarantee of an algorithm, without access to the internals of the algorithm, is lower bounded by the amount of information required to break the privacy guarantee. Finally, we explore a setting where imposing privacy is a help rather than a hindrance: online linear optimisation. We argue that private algorithms have the right kind of stability guarantee to ensure low regret for online linear optimisation.PHDMathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143940/1/amcm_1.pd

    Statistical and computational rates in high rank tensor estimation

    Full text link
    Higher-order tensor datasets arise commonly in recommendation systems, neuroimaging, and social networks. Here we develop probable methods for estimating a possibly high rank signal tensor from noisy observations. We consider a generative latent variable tensor model that incorporates both high rank and low rank models, including but not limited to, simple hypergraphon models, single index models, low-rank CP models, and low-rank Tucker models. Comprehensive results are developed on both the statistical and computational limits for the signal tensor estimation. We find that high-dimensional latent variable tensors are of log-rank; the fact explains the pervasiveness of low-rank tensors in applications. Furthermore, we propose a polynomial-time spectral algorithm that achieves the computationally optimal rate. We show that the statistical-computational gap emerges only for latent variable tensors of order 3 or higher. Numerical experiments and two real data applications are presented to demonstrate the practical merits of our methods.Comment: 38 pages, 8 figure

    Advancements in latent space network modelling

    Get PDF
    The ubiquity of relational data has motivated an extensive literature on network analysis, and over the last two decades the latent space approach has become a popular network modelling framework. In this approach, the nodes of a network are represented in a low-dimensional latent space and the probability of interactions occurring are modelled as a function of the associated latent coordinates. This thesis focuses on computational and modelling aspects of the latent space approach, and we present two main contributions. First, we consider estimation of temporally evolving latent space networks in which interactions among a fixed population are observed through time. The latent coordinates of each node evolve other time and this presents a natural setting for the application of sequential monte carlo (SMC) methods. This facilitates online inference which allows estimation for dynamic networks in which the number of observations in time is large. Since the performance of SMC methods degrades as the dimension of the latent state space increases, we explore the high-dimensional SMC literature to allow estimation of networks with a larger number of nodes. Second, we develop a latent space model for network data in which the interactions occur between sets of the population and, as a motivating example, we consider a coauthorship network in which it is typical for more than two authors to contribute to an article. This type of data can be represented as a hypergraph, and we extend the latent space framework to this setting. Modelling the nodes in a latent space provides a convenient visualisation of the data and allows properties to be imposed on the hypergraph relationships. We develop a parsimonious model with a computationally convenient likelihood. Furthermore, we theoretically consider the properties of the degree distribution of our model and further explore its properties via simulation
    corecore