254 research outputs found
Recommended from our members
Community detection in network analysis: a survey
The existence of community structures in networks is not unusual, including in the domains of sociology, biology, and business, etc. The characteristic of the community structure is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity.
In academia, there is a surge in research efforts on community detection in network analysis, especially in developing statistically sound methodologies for exploring, modeling, and interpreting these kind of structures and relationships.
This survey paper aims to provide a brief review of current applicable
statistical methodologies and approaches in a comparative manner along with metrics for evaluating graph clustering results and application using R. At the
end, we provide promising future research directions.Statistic
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
Estimation in a Binomial Stochastic Blockmodel for a Weighted Graph by a Variational Expectation Maximization Algorithm
Stochastic blockmodels have been widely proposed as a probabilistic random graph model for the analysis of networks data as well as for detecting community structure in these networks. In a number of real-world networks, not all ties among nodes have the same weight. Ties among networks nodes are often associated with weights that differentiate them in terms of their strength, intensity, or capacity. In this paper, we provide an inference method through a variational expectation maximization algorithm to estimate the parameters in binomial stochastic blockmodels for weighted networks. To prove the validity of the method and to highlight its main features, we set some applications of the proposed approach by using some simulated data and then some real data sets. Stochastic blockmodels belong to latent classes models. Classes defines a node's clustering. We compare the clustering found through binomial stochastic blockmodels with the ones found fitting a stochastic blockmodel with Poisson distributed edges. Inferred Poisson and binomial stochastic blockmodels mainly differs. Moreover, in our examples, the statistical error is lower for binomial stochastic blockmodels
Modeling heterogeneity in random graphs through latent space models: a selective review
We present a selective review on probabilistic modeling of heterogeneity in
random graphs. We focus on latent space models and more particularly on
stochastic block models and their extensions that have undergone major
developments in the last five years
Bayesian stochastic blockmodeling
This chapter provides a self-contained introduction to the use of Bayesian
inference to extract large-scale modular structures from network data, based on
the stochastic blockmodel (SBM), as well as its degree-corrected and
overlapping generalizations. We focus on nonparametric formulations that allow
their inference in a manner that prevents overfitting, and enables model
selection. We discuss aspects of the choice of priors, in particular how to
avoid underfitting via increased Bayesian hierarchies, and we contrast the task
of sampling network partitions from the posterior distribution with finding the
single point estimate that maximizes it, while describing efficient algorithms
to perform either one. We also show how inferring the SBM can be used to
predict missing and spurious links, and shed light on the fundamental
limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool
at https://graph-tool.skewed.de . See also the HOWTO at
https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm
Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel
The Mixed-Membership Stochastic Blockmodel~(MMSB) is proposed as one of the
state-of-the-art Bayesian relational methods suitable for learning the complex
hidden structure underlying the network data. However, the current formulation
of MMSB suffers from the following two issues: (1), the prior information~(e.g.
entities' community structural information) can not be well embedded in the
modelling; (2), community evolution can not be well described in the
literature. Therefore, we propose a non-parametric fragmentation coagulation
based Mixed Membership Stochastic Blockmodel (fcMMSB). Our model performs
entity-based clustering to capture the community information for entities and
linkage-based clustering to derive the group information for links
simultaneously. Besides, the proposed model infers the network structure and
models community evolution, manifested by appearances and disappearances of
communities, using the discrete fragmentation coagulation process (DFCP). By
integrating the community structure with the group compatibility matrix we
derive a generalized version of MMSB. An efficient Gibbs sampling scheme with
Polya Gamma (PG) approach is implemented for posterior inference. We validate
our model on synthetic and real world data.Comment: AAAI 202
Metrics for Graph Comparison: A Practitioner's Guide
Comparison of graph structure is a ubiquitous task in data analysis and
machine learning, with diverse applications in fields such as neuroscience,
cyber security, social network analysis, and bioinformatics, among others.
Discovery and comparison of structures such as modular communities, rich clubs,
hubs, and trees in data in these fields yields insight into the generative
mechanisms and functional properties of the graph.
Often, two graphs are compared via a pairwise distance measure, with a small
distance indicating structural similarity and vice versa. Common choices
include spectral distances (also known as distances) and distances
based on node affinities. However, there has of yet been no comparative study
of the efficacy of these distance measures in discerning between common graph
topologies and different structural scales.
In this work, we compare commonly used graph metrics and distance measures,
and demonstrate their ability to discern between common topological features
found in both random graph models and empirical datasets. We put forward a
multi-scale picture of graph structure, in which the effect of global and local
structure upon the distance measures is considered. We make recommendations on
the applicability of different distance measures to empirical graph data
problem based on this multi-scale view. Finally, we introduce the Python
library NetComp which implements the graph distances used in this work
- …