32 research outputs found

    A survey of statistical network models

    Full text link
    Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

    Community Detection using Locality Statistics

    Get PDF
    The goal of community detection is to identify clusters and groups of vertices that share common properties or play similar roles in a graph, using only the information encoded in the graph. Our work analyzes two methods of identifying an anomalous community in temporal graphs and another method of identifying active communities in a static massive graph. All methods are based on locality statistics. In [50], an anomalous community is detected that shows growing connectivities in a time series of graphs. We formulate the task as a hypothesis-testing problem in stochastic block model time series. We derive the limiting properties and power characteristics of two competing test statistics built on distinct underlying locality statistics. In addition, we provide applicable implementations of two competing test statistics and detailed experimental results for a neural imaging application in [36]. In [51], active communities are detected in a static massive graph on which many community detection algorithms scale poorly. We propose a novel framework for detecting active communities that consist of the most active vertices. Our framework utilizes a parallelizable trimming algorithm based on a locality statistic to filter out inactive vertices, and then clusters the remaining active vertices via spectral decomposition of their similarity matrix. The framework is applicable to graphs consisting of billions of vertices and hundreds of billions of edges. In summary, this work provides developments in community detection, in both temporal graphs and static massive graphs, by employing locality statistics

    Computational Methods for Learning and Inference on Dynamic Networks.

    Full text link
    Networks are ubiquitous in science, serving as a natural representation for many complex physical, biological, and social phenomena. Significant efforts have been dedicated to analyzing such network representations to reveal their structure and provide some insight towards the phenomena of interest. Computational methods for analyzing networks have typically been designed for static networks, which cannot capture the time-varying nature of many complex phenomena. In this dissertation, I propose new computational methods for machine learning and statistical inference on dynamic networks with time-evolving structures. Specifically, I develop methods for visualization, tracking, clustering, and prediction of dynamic networks. The proposed methods take advantage of the dynamic nature of the network by intelligently combining observations at multiple time steps. This involves the development of novel statistical models and state-space representations of dynamic networks. Using the methods proposed in this dissertation, I identify long-term trends and structural changes in a variety of dynamic network data sets including a social network of spammers and a network of physical proximity among employees and students at a university campus.PHDElectrical Engineering-SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/94022/1/xukevin_1.pd

    Followers Are Not Enough: A Question-Oriented Approach to Community Detection in Online Social Networks

    Full text link
    Community detection in online social networks is typically based on the analysis of the explicit connections between users, such as "friends" on Facebook and "followers" on Twitter. But online users often have hundreds or even thousands of such connections, and many of these connections do not correspond to real friendships or more generally to accounts that users interact with. We claim that community detection in online social networks should be question-oriented and rely on additional information beyond the simple structure of the network. The concept of 'community' is very general, and different questions such as "whom do we interact with?" and "with whom do we share similar interests?" can lead to the discovery of different social groups. In this paper we focus on three types of communities beyond structural communities: activity-based, topic-based, and interaction-based. We analyze a Twitter dataset using three different weightings of the structural network meant to highlight these three community types, and then infer the communities associated with these weightings. We show that the communities obtained in the three weighted cases are highly different from each other, and from the communities obtained by considering only the unweighted structural network. Our results confirm that asking a precise question is an unavoidable first step in community detection in online social networks, and that different questions can lead to different insights about the network under study.Comment: 22 pages, 4 figures, 1 table

    Empirical Bayes estimation for random dot product graph representation of the stochastic blockmodel

    Get PDF
    Network models are increasingly used to model datasets that involve interacting units, particularly random graph models where the vertices represent individual entities and the edges represent the presence or absence of a specified interaction between these entities. Finding inherent communities in networks (i.e. partitioning vertices with a more similar interaction pattern into groups) is considered to be a fundamental task in network analysis, which aids in understanding the structural properties of real-world networks. Despite a large amount of research on this task since the emergence of graphical representation of relational data, this still remains a challenge. In particular, within the statistical community, the use of the stochastic blockmodel for this task is currently of immense interest. Recent theoretical developments have shown that adjacency spectral embedding of graphs yields tractable distributional results. Specifically, a random dot product graph formulation of the stochastic blockmodel provides a mixture of multivariate Gaussians for the asymptotic distribution of the latent positions estimated by adjacency spectral embedding. The first part of this thesis seeks to employ this new theory to provide an empirical Bayes model for estimating block memberships of vertices in a stochastic blockmodel graph. Posterior inference is conducted using a Metropolis-within-Gibbs algorithm. Performance of the model is illustrated through Monte Carlo simulation studies and experimental results on a Wikipedia dataset. Results show performance gains over other alternative models that are considered. Instead of a complete classification of vertices via community detection, one may wish to discover whether vertices possess an attribute of interest. Given that this attribute is observed for a few vertices, the goal is to find other vertices that possess that same attribute. As an example, if a few employees in a company are known to have committed fraud, how can we identify others who may be complicit? This is a special case of community detection, known as vertex nomination, which has recently grown rapidly as a research topic. The second part of this thesis extends the empirical Bayes model for vertex nomination based on information contained in the graph structure. This yields promising simulation results as well as real-data results from an Enron email dataset. Recent studies have shown that information pertinent to vertex nomination exists not only in the graph structure but also in the edge attributes (Coppersmith and Priebe, 2012; Suwan et al., 2015). This motivates the third part of this thesis by further extending the model to exploit both graph structure and edge attributes for vertex nomination. Simulation studies confirm the benefit of doing so. However, the same benefit is not observed when the model is applied to the Enron email dataset; further investigations suggest that this is due to the data violating one of the model assumptions
    corecore