32 research outputs found
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
Recommended from our members
A Survey of Statistical Network Models
Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active ânetwork communityâ and a substantial liter- ature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning net- work literature in statistical physics and computer science. The growthof the World Wide Web and the emergence of online ânetworking com- munitiesâ such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize for- mal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Statistic
Community Detection using Locality Statistics
The goal of community detection is to identify clusters and groups of vertices that share common properties or play similar roles in a graph, using only the information encoded in the graph. Our work analyzes two methods of identifying an anomalous community in temporal graphs and another method of identifying active communities in a static massive graph. All methods are based on locality statistics.
In [50], an anomalous community is detected that shows growing connectivities in a time series of graphs. We formulate the task as a hypothesis-testing problem in stochastic block model time series. We derive the limiting properties and power characteristics of two competing test statistics built on distinct underlying locality statistics. In addition, we provide applicable implementations of two competing test statistics and detailed experimental results for a neural imaging application in [36].
In [51], active communities are detected in a static massive graph on which many community detection algorithms scale poorly. We propose a novel framework for detecting active communities that consist of the most active vertices. Our framework utilizes a parallelizable trimming algorithm based on a locality statistic to filter out inactive vertices, and then clusters the remaining active vertices via spectral decomposition of their similarity matrix. The framework is applicable to graphs consisting of billions of vertices and hundreds of billions of edges.
In summary, this work provides developments in community detection, in both temporal graphs and static massive graphs, by employing locality statistics
Computational Methods for Learning and Inference on Dynamic Networks.
Networks are ubiquitous in science, serving as a natural representation for many complex physical, biological, and social phenomena. Significant efforts have been dedicated to analyzing such network representations to reveal their structure and provide some insight towards the phenomena of interest. Computational methods for analyzing networks have typically been designed for static networks, which cannot capture the time-varying nature of many complex phenomena.
In this dissertation, I propose new computational methods for machine learning and statistical inference on dynamic networks with time-evolving structures. Specifically, I develop methods for visualization, tracking, clustering, and prediction of dynamic networks. The proposed methods take advantage of the dynamic nature of the network by intelligently combining observations at multiple time steps. This involves the development of novel statistical models and state-space representations of dynamic networks. Using the methods proposed in this dissertation, I identify long-term trends and structural changes in a variety of dynamic network data sets including a social network of spammers and a network of physical proximity among employees and students at a university campus.PHDElectrical Engineering-SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/94022/1/xukevin_1.pd
Followers Are Not Enough: A Question-Oriented Approach to Community Detection in Online Social Networks
Community detection in online social networks is typically based on the
analysis of the explicit connections between users, such as "friends" on
Facebook and "followers" on Twitter. But online users often have hundreds or
even thousands of such connections, and many of these connections do not
correspond to real friendships or more generally to accounts that users
interact with. We claim that community detection in online social networks
should be question-oriented and rely on additional information beyond the
simple structure of the network. The concept of 'community' is very general,
and different questions such as "whom do we interact with?" and "with whom do
we share similar interests?" can lead to the discovery of different social
groups. In this paper we focus on three types of communities beyond structural
communities: activity-based, topic-based, and interaction-based. We analyze a
Twitter dataset using three different weightings of the structural network
meant to highlight these three community types, and then infer the communities
associated with these weightings. We show that the communities obtained in the
three weighted cases are highly different from each other, and from the
communities obtained by considering only the unweighted structural network. Our
results confirm that asking a precise question is an unavoidable first step in
community detection in online social networks, and that different questions can
lead to different insights about the network under study.Comment: 22 pages, 4 figures, 1 table
Empirical Bayes estimation for random dot product graph representation of the stochastic blockmodel
Network models are increasingly used to model datasets that involve interacting units, particularly
random graph models where the vertices represent individual entities and the edges represent
the presence or absence of a specified interaction between these entities. Finding inherent
communities in networks (i.e. partitioning vertices with a more similar interaction pattern into
groups) is considered to be a fundamental task in network analysis, which aids in understanding
the structural properties of real-world networks. Despite a large amount of research on this task
since the emergence of graphical representation of relational data, this still remains a challenge.
In particular, within the statistical community, the use of the stochastic blockmodel for this task
is currently of immense interest.
Recent theoretical developments have shown that adjacency spectral embedding of graphs yields
tractable distributional results. Specifically, a random dot product graph formulation of the
stochastic blockmodel provides a mixture of multivariate Gaussians for the asymptotic distribution
of the latent positions estimated by adjacency spectral embedding. The first part of this
thesis seeks to employ this new theory to provide an empirical Bayes model for estimating block
memberships of vertices in a stochastic blockmodel graph. Posterior inference is conducted using
a Metropolis-within-Gibbs algorithm. Performance of the model is illustrated through Monte
Carlo simulation studies and experimental results on a Wikipedia dataset. Results show performance
gains over other alternative models that are considered.
Instead of a complete classification of vertices via community detection, one may wish to discover
whether vertices possess an attribute of interest. Given that this attribute is observed for a few
vertices, the goal is to find other vertices that possess that same attribute. As an example, if a
few employees in a company are known to have committed fraud, how can we identify others who
may be complicit? This is a special case of community detection, known as vertex nomination,
which has recently grown rapidly as a research topic. The second part of this thesis extends
the empirical Bayes model for vertex nomination based on information contained in the graph
structure. This yields promising simulation results as well as real-data results from an Enron
email dataset.
Recent studies have shown that information pertinent to vertex nomination exists not only in
the graph structure but also in the edge attributes (Coppersmith and Priebe, 2012; Suwan et al.,
2015). This motivates the third part of this thesis by further extending the model to exploit
both graph structure and edge attributes for vertex nomination. Simulation studies confirm the
benefit of doing so. However, the same benefit is not observed when the model is applied to the
Enron email dataset; further investigations suggest that this is due to the data violating one of
the model assumptions