Search CORE

1,256 research outputs found

Community Detection with and without Prior Information

Author: Allahverdyan Armen E.
Galstyan Aram
Steeg Greg Ver
Publication venue: 'IOP Publishing'
Publication date: 27/04/2010
Field of study

We study the problem of graph partitioning, or clustering, in sparse networks with prior information about the clusters. Specifically, we assume that for a fraction

\rho

of the nodes their true cluster assignments are known in advance. This can be understood as a semi--supervised version of clustering, in contrast to unsupervised clustering where the only available information is the graph structure. In the unsupervised case, it is known that there is a threshold of the inter--cluster connectivity beyond which clusters cannot be detected. Here we study the impact of the prior information on the detection threshold, and show that even minute [but generic] values of

\rho>0

shift the threshold downwards to its lowest possible value. For weighted graphs we show that a small semi--supervising can be used for a non-trivial definition of communities.Comment: 6 pages, 2 figure

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Phase Transitions in Community Detection: A Solvable Toy Model

Author: Allahverdyan Armen E.
Galstyan Aram
Moore Cristopher
Steeg Greg Ver
Publication venue: 'IOP Publishing'
Publication date: 02/12/2013
Field of study

Recently, it was shown that there is a phase transition in the community detection problem. This transition was first computed using the cavity method, and has been proved rigorously in the case of

q=2

groups. However, analytic calculations using the cavity method are challenging since they require us to understand probability distributions of messages. We study analogous transitions in so-called "zero-temperature inference" model, where this distribution is supported only on the most-likely messages. Furthermore, whenever several messages are equally likely, we break the tie by choosing among them with equal probability. While the resulting analysis does not give the correct values of the thresholds, it does reproduce some of the qualitative features of the system. It predicts a first-order detectability transition whenever

q > 2

, while the finite-temperature cavity method shows that this is the case only when

q > 4

. It also has a regime analogous to the "hard but detectable" phase, where the community structure can be partially recovered, but only when the initial messages are sufficiently accurate. Finally, we study a semisupervised setting where we are given the correct labels for a fraction

\rho

of the nodes. For

q > 2

, we find a regime where the accuracy jumps discontinuously at a critical value of

\rho

.Comment: 6 pages, 6 figure

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Analysis of Three-Dimensional Protein Images

Author: Baxter K.
Fortier S.
Glasgow J.
Leherte L.
Steeg E.
Publication venue
Publication date: 01/01/1997
Field of study

A fundamental goal of research in molecular biology is to understand protein structure. Protein crystallography is currently the most successful method for determining the three-dimensional (3D) conformation of a protein, yet it remains labor intensive and relies on an expert's ability to derive and evaluate a protein scene model. In this paper, the problem of protein structure determination is formulated as an exercise in scene analysis. A computational methodology is presented in which a 3D image of a protein is segmented into a graph of critical points. Bayesian and certainty factor approaches are described and used to analyze critical point graphs and identify meaningful substructures, such as alpha-helices and beta-sheets. Results of applying the methodologies to protein images at low and medium resolution are reported. The research is related to approaches to representation, segmentation and classification in vision, as well as to top-down approaches to protein structure prediction.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Repository of the University of Namur

Why Do Cascade Sizes Follow a Power-Law?

Author: Bayley N.
Brach P.
Erdös P.
Gaba A.
Lerman K.
Rogers E. M.
Steeg G. V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/02/2017
Field of study

We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is consistent with multiple empirical analysis of the large social networks. We compared the assumptions of our model with the Twitter social network and tested the goodness of approximation.Comment: 8 pages, 7 figures, accepted to WWW 201

arXiv.org e-Print Archive

Crossref

Statistical Mechanics of Semi-Supervised Clustering in Sparse Graphs

Author: Allahverdyan A
Aram Galstyan
Armen E Allahverdyan
Basu S Bilenko M Mooney R J
Erdös P
Getz G Shental N Domany E
Greg Ver Steeg
Mezard M
Pan R K
Reichardt J
Publication venue: 'IOP Publishing'
Publication date: 30/10/2011
Field of study

We theoretically study semi-supervised clustering in sparse graphs in the presence of pairwise constraints on the cluster assignments of nodes. We focus on bi-cluster graphs, and study the impact of semi-supervision for varying constraint density and overlap between the clusters. Recent results for unsupervised clustering in sparse graphs indicate that there is a critical ratio of within-cluster and between-cluster connectivities below which clusters cannot be recovered with better than random accuracy. The goal of this paper is to examine the impact of pairwise constraints on the clustering accuracy. Our results suggests that the addition of constraints does not provide automatic improvement over the unsupervised case. When the density of the constraints is sufficiently small, their only impact is to shift the detection threshold while preserving the criticality. Conversely, if the density of (hard) constraints is above the percolation threshold, the criticality is suppressed and the detection threshold disappears.Comment: 8 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

Neural network definitions of highly predictable protein secondary structure classes

Author: Farber R.
Lapedes A.
Steeg E.
Publication venue: Los Alamos National Laboratory
Publication date: 01/02/1994
Field of study

We use two co-evolving neural networks to determine new classes of protein secondary structure which are significantly more predictable from local amino sequence than the conventional secondary structure classification. Accurate prediction of the conventional secondary structure classes: alpha helix, beta strand, and coil, from primary sequence has long been an important problem in computational molecular biology. Neural networks have been a popular method to attempt to predict these conventional secondary structure classes. Accuracy has been disappointingly low. The algorithm presented here uses neural networks to similtaneously examine both sequence and structure data, and to evolve new classes of secondary structure that can be predicted from sequence with significantly higher accuracy than the conventional classes. These new classes have both similarities to, and differences with the conventional alpha helix, beta strand and coil

UNT Digital Library

Determination of Expression Levels of all 20 DNA Polymerases in Zebrafish (Danio rerio) thoughout Development and following Treatment with MNU, Doxorubicin, and UV Radiation (abstract)

Author: Della Fera B.
Gestl Erin E.
Miller M.
Petovic J.
Sekela L.
Ver Steeg L.
Woolcock J.
Publication venue: Digital Commons @ West Chester University
Publication date: 01/12/2014
Field of study

Digital Commons @ West Chester University