5,361 research outputs found
Semi-Supervised Overlapping Community Finding based on Label Propagation with Pairwise Constraints
Algorithms for detecting communities in complex networks are generally
unsupervised, relying solely on the structure of the network. However, these
methods can often fail to uncover meaningful groupings that reflect the
underlying communities in the data, particularly when those structures are
highly overlapping. One way to improve the usefulness of these algorithms is by
incorporating additional background information, which can be used as a source
of constraints to direct the community detection process. In this work, we
explore the potential of semi-supervised strategies to improve algorithms for
finding overlapping communities in networks. Specifically, we propose a new
method, based on label propagation, for finding communities using a limited
number of pairwise constraints. Evaluations on synthetic and real-world
datasets demonstrate the potential of this approach for uncovering meaningful
community structures in cases where each node can potentially belong to more
than one community.Comment: Fix table
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Exploring the roles of cannot-link constraint in community detection via multi-variance mixed Gaussian generative model
Due to the demand for performance improvement and the existence of prior information, semi-supervised community detection with pairwise constraints becomes a hot topic. Most existing methods have been successfully encoding the must-link constraints, but neglect the opposite ones, i.e., the cannot-link constraints, which can force the exclusion between nodes. In this paper, we are interested in understanding the role of cannot-link constraints and effectively encoding pairwise constraints. Towards these goals, we define an integral generative process jointly considering the network topology, must-link and cannot-link constraints. We propose to characterize this process as a Multi-variance Mixed Gaussian Generative (MMGG) Model to address diverse degrees of confidences that exist in network topology and pairwise constraints and formulate it as a weighted nonnegative matrix factorization problem. The experiments on artificial and real-world networks not only illustrate the superiority of our proposed MMGG, but also, most importantly, reveal the roles of pairwise constraints. That is, though the must-link is more important than cannot-link when either of them is available, both must-link and cannot-link are equally important when both of them are available. To the best of our knowledge, this is the first work on discovering and exploring the importance of cannot-link constraints in semi-supervised community detection
- …