1,606 research outputs found
Solving Maximum Clique Problem for Protein Structure Similarity
A basic assumption of molecular biology is that proteins sharing close
three-dimensional (3D) structures are likely to share a common function and in
most cases derive from a same ancestor. Computing the similarity between two
protein structures is therefore a crucial task and has been extensively
investigated. Evaluating the similarity of two proteins can be done by finding
an optimal one-to-one matching between their components, which is equivalent to
identifying a maximum weighted clique in a specific "alignment graph". In this
paper we present a new integer programming formulation for solving such clique
problems. The model has been implemented using the ILOG CPLEX Callable Library.
In addition, we designed a dedicated branch and bound algorithm for solving the
maximum cardinality clique problem. Both approaches have been integrated in
VAST (Vector Alignment Search Tool) - a software for aligning protein 3D
structures largely used in NCBI (National Center for Biotechnology
Information). The original VAST clique solver uses the well known Bron and
Kerbosh algorithm (BK). Our computational results on real life protein
alignment instances show that our branch and bound algorithm is up to 116 times
faster than BK for the largest proteins
Maximum common subgraph isomorphism algorithms for the matching of chemical structures
The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinformatics tasks
RASCAL: calculation of graph similarity using maximum common edge subgraphs
A new graph similarity calculation procedure is introduced for comparing labeled graphs. Given a minimum similarity threshold, the procedure consists of an initial screening process to determine whether it is possible for the measure of similarity between the two graphs to exceed the minimum threshold, followed by a rigorous maximum common edge subgraph (MCES) detection algorithm to compute the exact degree and composition of similarity. The proposed MCES algorithm is based on a maximum clique formulation of the problem and is a significant improvement over other published algorithms. It presents new approaches to both lower and upper bounding as well as vertex selection
Subgraph Matching Kernels for Attributed Graphs
We propose graph kernels based on subgraph matchings, i.e.
structure-preserving bijections between subgraphs. While recently proposed
kernels based on common subgraphs (Wale et al., 2008; Shervashidze et al.,
2009) in general can not be applied to attributed graphs, our approach allows
to rate mappings of subgraphs by a flexible scoring scheme comparing vertex and
edge attributes by kernels. We show that subgraph matching kernels generalize
several known kernels. To compute the kernel we propose a graph-theoretical
algorithm inspired by a classical relation between common subgraphs of two
graphs and cliques in their product graph observed by Levi (1973). Encouraging
experimental results on a classification task of real-world graphs are
presented.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Order independent structural alignment of circularly permuted proteins
Circular permutation connects the N and C termini of a protein and
concurrently cleaves elsewhere in the chain, providing an important mechanism
for generating novel protein fold and functions. However, their in genomes is
unknown because current detection methods can miss many occurances, mistaking
random repeats as circular permutation. Here we develop a method for detecting
circularly permuted proteins from structural comparison. Sequence order
independent alignment of protein structures can be regarded as a special case
of the maximum-weight independent set problem, which is known to be
computationally hard. We develop an efficient approximation algorithm by
repeatedly solving relaxations of an appropriate intermediate integer
programming formulation, we show that the approximation ratio is much better
then the theoretical worst case ratio of . Circularly permuted
proteins reported in literature can be identified rapidly with our method,
while they escape the detection by publicly available servers for structural
alignment.Comment: 5 pages, 3 figures, Accepted by IEEE-EMBS 2004 Conference Proceeding
Graph theoretic methods for the analysis of structural relationships in biological macromolecules
Subgraph isomorphism and maximum common subgraph isomorphism algorithms from graph theory provide an effective and an efficient way of identifying structural relationships between biological macromolecules. They thus provide a natural complement to the pattern matching algorithms that are used in bioinformatics to identify sequence relationships. Examples are provided of the use of graph theory to analyze proteins for which three-dimensional crystallographic or NMR structures are available, focusing on the use of the Bron-Kerbosch clique detection algorithm to identify common folding motifs and of the Ullmann subgraph isomorphism algorithm to identify patterns of amino acid residues. Our methods are also applicable to other types of biological macromolecule, such as carbohydrate and nucleic acid structures
Mining Maximal Cliques from an Uncertain Graph
We consider mining dense substructures (maximal cliques) from an uncertain
graph, which is a probability distribution on a set of deterministic graphs.
For parameter 0 < {\alpha} < 1, we present a precise definition of an
{\alpha}-maximal clique in an uncertain graph. We present matching upper and
lower bounds on the number of {\alpha}-maximal cliques possible within an
uncertain graph. We present an algorithm to enumerate {\alpha}-maximal cliques
in an uncertain graph whose worst-case runtime is near-optimal, and an
experimental evaluation showing the practical utility of the algorithm.Comment: ICDE 201
Growing Graphs with Hyperedge Replacement Graph Grammars
Discovering the underlying structures present in large real world graphs is a
fundamental scientific problem. In this paper we show that a graph's clique
tree can be used to extract a hyperedge replacement grammar. If we store an
ordering from the extraction process, the extracted graph grammar is guaranteed
to generate an isomorphic copy of the original graph. Or, a stochastic
application of the graph grammar rules can be used to quickly create random
graphs. In experiments on large real world networks, we show that random
graphs, generated from extracted graph grammars, exhibit a wide range of
properties that are very similar to the original graphs. In addition to graph
properties like degree or eigenvector centrality, what a graph "looks like"
ultimately depends on small details in local graph substructures that are
difficult to define at a global level. We show that our generative graph model
is able to preserve these local substructures when generating new graphs and
performs well on new and difficult tests of model robustness.Comment: 18 pages, 19 figures, accepted to CIKM 2016 in Indianapolis, I
Fine-grained Search Space Classification for Hard Enumeration Variants of Subset Problems
We propose a simple, powerful, and flexible machine learning framework for
(i) reducing the search space of computationally difficult enumeration variants
of subset problems and (ii) augmenting existing state-of-the-art solvers with
informative cues arising from the input distribution. We instantiate our
framework for the problem of listing all maximum cliques in a graph, a central
problem in network analysis, data mining, and computational biology. We
demonstrate the practicality of our approach on real-world networks with
millions of vertices and edges by not only retaining all optimal solutions, but
also aggressively pruning the input instance size resulting in several fold
speedups of state-of-the-art algorithms. Finally, we explore the limits of
scalability and robustness of our proposed framework, suggesting that
supervised learning is viable for tackling NP-hard problems in practice.Comment: AAAI 201
- …