1,256 research outputs found
Community Detection with and without Prior Information
We study the problem of graph partitioning, or clustering, in sparse networks
with prior information about the clusters. Specifically, we assume that for a
fraction of the nodes their true cluster assignments are known in
advance. This can be understood as a semi--supervised version of clustering, in
contrast to unsupervised clustering where the only available information is the
graph structure. In the unsupervised case, it is known that there is a
threshold of the inter--cluster connectivity beyond which clusters cannot be
detected. Here we study the impact of the prior information on the detection
threshold, and show that even minute [but generic] values of shift the
threshold downwards to its lowest possible value. For weighted graphs we show
that a small semi--supervising can be used for a non-trivial definition of
communities.Comment: 6 pages, 2 figure
Phase Transitions in Community Detection: A Solvable Toy Model
Recently, it was shown that there is a phase transition in the community
detection problem. This transition was first computed using the cavity method,
and has been proved rigorously in the case of groups. However, analytic
calculations using the cavity method are challenging since they require us to
understand probability distributions of messages. We study analogous
transitions in so-called "zero-temperature inference" model, where this
distribution is supported only on the most-likely messages. Furthermore,
whenever several messages are equally likely, we break the tie by choosing
among them with equal probability. While the resulting analysis does not give
the correct values of the thresholds, it does reproduce some of the qualitative
features of the system. It predicts a first-order detectability transition
whenever , while the finite-temperature cavity method shows that this is
the case only when . It also has a regime analogous to the "hard but
detectable" phase, where the community structure can be partially recovered,
but only when the initial messages are sufficiently accurate. Finally, we study
a semisupervised setting where we are given the correct labels for a fraction
of the nodes. For , we find a regime where the accuracy jumps
discontinuously at a critical value of .Comment: 6 pages, 6 figure
Analysis of Three-Dimensional Protein Images
A fundamental goal of research in molecular biology is to understand protein
structure. Protein crystallography is currently the most successful method for
determining the three-dimensional (3D) conformation of a protein, yet it
remains labor intensive and relies on an expert's ability to derive and
evaluate a protein scene model. In this paper, the problem of protein structure
determination is formulated as an exercise in scene analysis. A computational
methodology is presented in which a 3D image of a protein is segmented into a
graph of critical points. Bayesian and certainty factor approaches are
described and used to analyze critical point graphs and identify meaningful
substructures, such as alpha-helices and beta-sheets. Results of applying the
methodologies to protein images at low and medium resolution are reported. The
research is related to approaches to representation, segmentation and
classification in vision, as well as to top-down approaches to protein
structure prediction.Comment: See http://www.jair.org/ for any accompanying file
Why Do Cascade Sizes Follow a Power-Law?
We introduce random directed acyclic graph and use it to model the
information diffusion network. Subsequently, we analyze the cascade generation
model (CGM) introduced by Leskovec et al. [19]. Until now only empirical
studies of this model were done. In this paper, we present the first
theoretical proof that the sizes of cascades generated by the CGM follow the
power-law distribution, which is consistent with multiple empirical analysis of
the large social networks. We compared the assumptions of our model with the
Twitter social network and tested the goodness of approximation.Comment: 8 pages, 7 figures, accepted to WWW 201
Statistical Mechanics of Semi-Supervised Clustering in Sparse Graphs
We theoretically study semi-supervised clustering in sparse graphs in the
presence of pairwise constraints on the cluster assignments of nodes. We focus
on bi-cluster graphs, and study the impact of semi-supervision for varying
constraint density and overlap between the clusters. Recent results for
unsupervised clustering in sparse graphs indicate that there is a critical
ratio of within-cluster and between-cluster connectivities below which clusters
cannot be recovered with better than random accuracy. The goal of this paper is
to examine the impact of pairwise constraints on the clustering accuracy. Our
results suggests that the addition of constraints does not provide automatic
improvement over the unsupervised case. When the density of the constraints is
sufficiently small, their only impact is to shift the detection threshold while
preserving the criticality. Conversely, if the density of (hard) constraints is
above the percolation threshold, the criticality is suppressed and the
detection threshold disappears.Comment: 8 pages, 4 figure
Recommended from our members
Neural network definitions of highly predictable protein secondary structure classes
We use two co-evolving neural networks to determine new classes of protein secondary structure which are significantly more predictable from local amino sequence than the conventional secondary structure classification. Accurate prediction of the conventional secondary structure classes: alpha helix, beta strand, and coil, from primary sequence has long been an important problem in computational molecular biology. Neural networks have been a popular method to attempt to predict these conventional secondary structure classes. Accuracy has been disappointingly low. The algorithm presented here uses neural networks to similtaneously examine both sequence and structure data, and to evolve new classes of secondary structure that can be predicted from sequence with significantly higher accuracy than the conventional classes. These new classes have both similarities to, and differences with the conventional alpha helix, beta strand and coil
- …