36,830 research outputs found
Multi-level algorithms for modularity clustering
Modularity is one of the most widely used quality measures for graph
clusterings. Maximizing modularity is NP-hard, and the runtime of exact
algorithms is prohibitive for large graphs. A simple and effective class of
heuristics coarsens the graph by iteratively merging clusters (starting from
singletons), and optionally refines the resulting clustering by iteratively
moving individual vertices between clusters. Several heuristics of this type
have been proposed in the literature, but little is known about their relative
performance.
This paper experimentally compares existing and new coarsening- and
refinement-based heuristics with respect to their effectiveness (achieved
modularity) and efficiency (runtime). Concerning coarsening, it turns out that
the most widely used criterion for merging clusters (modularity increase) is
outperformed by other simple criteria, and that a recent algorithm by Schuetz
and Caflisch is no improvement over simple greedy coarsening for these
criteria. Concerning refinement, a new multi-level algorithm is shown to
produce significantly better clusterings than conventional single-level
algorithms. A comparison with published benchmark results and algorithm
implementations shows that combinations of coarsening and multi-level
refinement are competitive with the best algorithms in the literature.Comment: 12 pages, 10 figures, see
http://www.informatik.tu-cottbus.de/~rrotta/ for downloading the graph
clustering softwar
Image classification by visual bag-of-words refinement and reduction
This paper presents a new framework for visual bag-of-words (BOW) refinement
and reduction to overcome the drawbacks associated with the visual BOW model
which has been widely used for image classification. Although very influential
in the literature, the traditional visual BOW model has two distinct drawbacks.
Firstly, for efficiency purposes, the visual vocabulary is commonly constructed
by directly clustering the low-level visual feature vectors extracted from
local keypoints, without considering the high-level semantics of images. That
is, the visual BOW model still suffers from the semantic gap, and thus may lead
to significant performance degradation in more challenging tasks (e.g. social
image classification). Secondly, typically thousands of visual words are
generated to obtain better performance on a relatively large image dataset. Due
to such large vocabulary size, the subsequent image classification may take
sheer amount of time. To overcome the first drawback, we develop a graph-based
method for visual BOW refinement by exploiting the tags (easy to access
although noisy) of social images. More notably, for efficient image
classification, we further reduce the refined visual BOW model to a much
smaller size through semantic spectral clustering. Extensive experimental
results show the promising performance of the proposed framework for visual BOW
refinement and reduction
Towards Reliable Automatic Protein Structure Alignment
A variety of methods have been proposed for structure similarity calculation,
which are called structure alignment or superposition. One major shortcoming in
current structure alignment algorithms is in their inherent design, which is
based on local structure similarity. In this work, we propose a method to
incorporate global information in obtaining optimal alignments and
superpositions. Our method, when applied to optimizing the TM-score and the GDT
score, produces significantly better results than current state-of-the-art
protein structure alignment tools. Specifically, if the highest TM-score found
by TMalign is lower than (0.6) and the highest TM-score found by one of the
tested methods is higher than (0.5), there is a probability of (42%) that
TMalign failed to find TM-scores higher than (0.5), while the same probability
is reduced to (2%) if our method is used. This could significantly improve the
accuracy of fold detection if the cutoff TM-score of (0.5) is used.
In addition, existing structure alignment algorithms focus on structure
similarity alone and simply ignore other important similarities, such as
sequence similarity. Our approach has the capacity to incorporate multiple
similarities into the scoring function. Results show that sequence similarity
aids in finding high quality protein structure alignments that are more
consistent with eye-examined alignments in HOMSTRAD. Even when structure
similarity itself fails to find alignments with any consistency with
eye-examined alignments, our method remains capable of finding alignments
highly similar to, or even identical to, eye-examined alignments.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
From Nonspecific DNA–Protein Encounter Complexes to the Prediction of DNA–Protein Interactions
©2009 Gao, Skolnick. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.doi:10.1371/journal.pcbi.1000341DNA–protein interactions are involved in many essential biological activities. Because there is no simple mapping code between DNA base pairs and protein amino acids, the prediction of DNA–protein interactions is a challenging problem. Here, we present a novel computational approach for predicting DNA-binding protein residues and DNA–protein interaction modes without knowing its specific DNA target sequence. Given the structure of a DNA-binding protein, the method first generates an ensemble of complex structures obtained by rigid-body docking with a nonspecific canonical B-DNA. Representative models are subsequently selected through clustering and ranking by their DNA–protein interfacial energy. Analysis of these encounter complex models suggests that the recognition sites for specific DNA binding are usually favorable interaction sites for the nonspecific DNA probe and that nonspecific DNA–protein interaction modes exhibit some similarity to specific DNA–protein binding modes. Although the method requires as input the knowledge that the protein binds DNA, in benchmark tests, it achieves better performance in identifying DNA-binding sites than three previously established methods, which are based on sophisticated machine-learning techniques. We further apply our method to protein structures predicted through modeling and demonstrate that our method performs satisfactorily on protein models whose root-mean-square Ca deviation from native is up to 5 Å from their native structures. This study provides valuable structural insights into how a specific DNA-binding protein interacts with a nonspecific DNA sequence. The similarity between the specific DNA–protein interaction mode and nonspecific interaction modes may reflect an important sampling step in search of its specific DNA targets by a DNA-binding protein
- …