614 research outputs found
Mixed membership stochastic blockmodels
Observations consisting of measurements on relationships for pairs of objects
arise in many settings, such as protein interaction and gene regulatory
networks, collections of author-recipient email, and social networks. Analyzing
such data with probabilisic models can be delicate because the simple
exchangeability assumptions underlying many boilerplate models no longer hold.
In this paper, we describe a latent variable model of such data called the
mixed membership stochastic blockmodel. This model extends blockmodels for
relational data to ones which capture mixed membership latent relational
structure, thus providing an object-specific low-dimensional representation. We
develop a general variational inference algorithm for fast approximate
posterior inference. We explore applications to social and protein interaction
networks.Comment: 46 pages, 14 figures, 3 table
Confidence sets for network structure
Latent variable models are frequently used to identify structure in
dichotomous network data, in part because they give rise to a Bernoulli product
likelihood that is both well understood and consistent with the notion of
exchangeable random graphs. In this article we propose conservative confidence
sets that hold with respect to these underlying Bernoulli parameters as a
function of any given partition of network nodes, enabling us to assess
estimates of 'residual' network structure, that is, structure that cannot be
explained by known covariates and thus cannot be easily verified by manual
inspection. We demonstrate the proposed methodology by analyzing student
friendship networks from the National Longitudinal Survey of Adolescent Health
that include race, gender, and school year as covariates. We employ a
stochastic expectation-maximization algorithm to fit a logistic regression
model that includes these explanatory variables as well as a latent stochastic
blockmodel component and additional node-specific effects. Although
maximum-likelihood estimates do not appear consistent in this context, we are
able to evaluate confidence sets as a function of different blockmodel
partitions, which enables us to qualitatively assess the significance of
estimated residual network structure relative to a baseline, which models
covariates but lacks block structure.Comment: 17 pages, 3 figures, 3 table
How Many Communities Are There?
Stochastic blockmodels and variants thereof are among the most widely used
approaches to community detection for social networks and relational data. A
stochastic blockmodel partitions the nodes of a network into disjoint sets,
called communities. The approach is inherently related to clustering with
mixture models; and raises a similar model selection problem for the number of
communities. The Bayesian information criterion (BIC) is a popular solution,
however, for stochastic blockmodels, the conditional independence assumption
given the communities of the endpoints among different edges is usually
violated in practice. In this regard, we propose composite likelihood BIC
(CL-BIC) to select the number of communities, and we show it is robust against
possible misspecifications in the underlying stochastic blockmodel assumptions.
We derive the requisite methodology and illustrate the approach using both
simulated and real data. Supplementary materials containing the relevant
computer code are available online.Comment: 26 pages, 3 figure
Recommended from our members
Community detection in network analysis: a survey
The existence of community structures in networks is not unusual, including in the domains of sociology, biology, and business, etc. The characteristic of the community structure is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity.
In academia, there is a surge in research efforts on community detection in network analysis, especially in developing statistically sound methodologies for exploring, modeling, and interpreting these kind of structures and relationships.
This survey paper aims to provide a brief review of current applicable
statistical methodologies and approaches in a comparative manner along with metrics for evaluating graph clustering results and application using R. At the
end, we provide promising future research directions.Statistic
- …