9 research outputs found
Directed mixed membership stochastic blockmodel
Mixed membership problem for undirected network has been well studied in
network analysis recent years. However, the more general case of mixed
membership for directed network remains a challenge. Here, we propose an
interpretable and identifiable model: directed mixed membership stochastic
blockmodel (DiMMSB for short) for directed mixed membership networks. DiMMSB
allows that row nodes and column nodes of the adjacency matrix can be different
and these nodes may have distinct community structure in a directed network. We
also develop an efficient spectral algorithm called DiSP designed based on
simplex structures inherent in the left and right singular vectors of the
population adjacency matrix to estimate the mixed memberships for both row
nodes and column nodes in a directed network. We show that DiSP is
asymptotically consistent under mild conditions by providing error bounds for
the inferred membership vectors of each row node and each column node using
delicate spectral analysis. We demonstrate the advantages of DiSP with
applications to simulated directed mixed membership network, the directed
Political blogs network and the Papers Citation network.Comment: 35 pages, 6 figures, 2 table
Statistical Tools for Directed and Bipartite Networks
Directed networks and bipartite networks, which exhibit unique asymmetric connectivity structures, are commonly observed in a variety of scientific and engineering fields. Despite their abundance and utility, most network analysis methods only consider symmetric networks. In this thesis, we develop statistical methods and theory for directed and bipartite networks.
The first chapter focuses on matched community detection in a bipartite network. The detection of matched communities, i.e. communities that consist of nodes of two types that are closely connected with one another, is a fundamental and challenging problem. Most widely used approaches for matched community detection are either computationally inefficient or prone to non-ideal performance. We propose a new two-stage algorithm that uses fast spectral methods to recover matched communities. We show that, for bipartite networks, it is critical to adjust for the community size in matched community detection, which had not been considered before. We also provide theoretical error bounds for the proposed algorithm on the number of mis-clustered nodes under a variant of the stochastic block model. Numerical studies indicate that the proposed method outperforms existing spectral algorithms, especially when the sizes of the matched communities are proportionally different between the two types.
The second chapter of the thesis introduces a new preference-based block model for community detection in a directed network. Unlike existing models, the proposed model allows different sender nodes to have different preferences to communities in the network. We argue that the right singular vectors of a graph Laplacian matrix contain community structures under the model. Further, we propose a spectral clustering algorithm to detect communities and estimate parameters of the model. Theoretical results show insights on how the heterogeneity of preferences and out-degrees contribute to an upper bound of the number of mis-clustered nodes. Numerical studies support the theoretical results and illustrate the outstanding performance of the proposed method. The model can also be naturally extended to bipartite networks.
In the third chapter, we propose a dyadic latent space model which accommodates the reciprocity between a pair of nodes in directed networks. Nodes in a pair in directed networks often exhibit strong dependencies with each other, though most widely used approaches usually account for this phenomenon with limited flexibility. We propose a new latent space model for directed networks that incorporates the reciprocity in a flexible way, allowing for important characteristics such as homophily and heterogeneity of the nodes. A fast and scalable algorithm based on projected gradient descent has been developed to fit the model by maximizing the likelihood. Both simulation studies and real-world data examples illustrate that the proposed model is effective in various network analysis tasks including link prediction and community detection.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163156/1/yoohs_1.pd
Recommended from our members
Matched bipartite block model with covariates
Community detection or clustering is a fundamental task in the analysis of network data. Many real networks have a bipartite structure which makes community detection challenging. In this paper, we consider a model which allows for matched communities in the bipartite setting, in addition to node covariates with information about the matching. We derive a simple fast algorithm for fitting the model based on variational inference ideas and show its effectiveness on both simulated and real data. A variation of the model to allow for degree-correction is also considered, in addition to a novel approach to fitting such degree-corrected models