Search CORE

1,606 research outputs found

Solving Maximum Clique Problem for Protein Structure Similarity

Author: Andonov Rumen
Malod-Dognin Noël
Yanev Nicola
Publication venue
Publication date: 01/01/2009
Field of study

A basic assumption of molecular biology is that proteins sharing close three-dimensional (3D) structures are likely to share a common function and in most cases derive from a same ancestor. Computing the similarity between two protein structures is therefore a crucial task and has been extensively investigated. Evaluating the similarity of two proteins can be done by finding an optimal one-to-one matching between their components, which is equivalent to identifying a maximum weighted clique in a specific "alignment graph". In this paper we present a new integer programming formulation for solving such clique problems. The model has been implemented using the ILOG CPLEX Callable Library. In addition, we designed a dedicated branch and bound algorithm for solving the maximum cardinality clique problem. Both approaches have been integrated in VAST (Vector Alignment Search Tool) - a software for aligning protein 3D structures largely used in NCBI (National Center for Biotechnology Information). The original VAST clique solver uses the well known Bron and Kerbosh algorithm (BK). Our computational results on real life protein alignment instances show that our branch and bound algorithm is up to 116 times faster than BK for the largest proteins

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Bulgarian Digital Mathematics Library at IMI-BAS

HAL-Rennes 1

Maximum common subgraph isomorphism algorithms for the matching of chemical structures

Author: Raymond J.W.
Willett P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinformatics tasks

CiteSeerX

White Rose Research Online

RASCAL: calculation of graph similarity using maximum common edge subgraphs

Author: Gardiner E.J.
Raymond J.W.
Willett P.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2002
Field of study

A new graph similarity calculation procedure is introduced for comparing labeled graphs. Given a minimum similarity threshold, the procedure consists of an initial screening process to determine whether it is possible for the measure of similarity between the two graphs to exceed the minimum threshold, followed by a rigorous maximum common edge subgraph (MCES) detection algorithm to compute the exact degree and composition of similarity. The proposed MCES algorithm is based on a maximum clique formulation of the problem and is a significant improvement over other published algorithms. It presents new approaches to both lower and upper bounding as well as vertex selection

CiteSeerX

Crossref

White Rose Research Online

Subgraph Matching Kernels for Attributed Graphs

Author: Kriege Nils
Mutzel Petra
Publication venue
Publication date: 01/01/2012
Field of study

We propose graph kernels based on subgraph matchings, i.e. structure-preserving bijections between subgraphs. While recently proposed kernels based on common subgraphs (Wale et al., 2008; Shervashidze et al., 2009) in general can not be applied to attributed graphs, our approach allows to rate mappings of subgraphs by a flexible scoring scheme comparing vertex and edge attributes by kernels. We show that subgraph matching kernels generalize several known kernels. To compute the kernel we propose a graph-theoretical algorithm inspired by a classical relation between common subgraphs of two graphs and cliques in their product graph observed by Levi (1973). Encouraging experimental results on a classification task of real-world graphs are presented.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

arXiv.org e-Print Archive

CiteSeerX

Order independent structural alignment of circularly permuted proteins

Author: Binkowski T. Andrew
DasGupta Bhaskar
Liang Jie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

Circular permutation connects the N and C termini of a protein and concurrently cleaves elsewhere in the chain, providing an important mechanism for generating novel protein fold and functions. However, their in genomes is unknown because current detection methods can miss many occurances, mistaking random repeats as circular permutation. Here we develop a method for detecting circularly permuted proteins from structural comparison. Sequence order independent alignment of protein structures can be regarded as a special case of the maximum-weight independent set problem, which is known to be computationally hard. We develop an efficient approximation algorithm by repeatedly solving relaxations of an appropriate intermediate integer programming formulation, we show that the approximation ratio is much better then the theoretical worst case ratio of

r = 1/4

. Circularly permuted proteins reported in literature can be identified rapidly with our method, while they escape the detection by publicly available servers for structural alignment.Comment: 5 pages, 3 figures, Accepted by IEEE-EMBS 2004 Conference Proceeding

arXiv.org e-Print Archive

Crossref

Graph theoretic methods for the analysis of structural relationships in biological macromolecules

Author: Altschul
Artymiuk
Artymiuk
Artymiuk
Artymiuk
Artymiuk
Barnard
Baxevanis
Benning
Berman
Bernstein
Brint
Brint
Bron
Bruno
Bryant
Crandell
Dean
Diestel
Doubet
Fan
Feizi
Figueras
Flores
Gardiner
Gati
Good
Gray
Groves
Gruer
Gund
Hagadone
Harrison
Holden
Hutchinson
Jasanoff
Johnson
Kanna
Klausner
Kleywegt
Koch
Kraulis
Lengauer
Lesk
Martin
Martin
McGregor
Messmer
Mitchell
Ollis
Pickering
Ray
Raymond
Read
Salton
Samudrala
Sayle
Simon
Srere
Sussenguth
Tesmer
Tinoco
Trinajstic
Tsukada
Ullmann
van Rijsbergen
Willett
Willett
Willett
Willett
Williams
Wilson
Zhang
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

Subgraph isomorphism and maximum common subgraph isomorphism algorithms from graph theory provide an effective and an efficient way of identifying structural relationships between biological macromolecules. They thus provide a natural complement to the pattern matching algorithms that are used in bioinformatics to identify sequence relationships. Examples are provided of the use of graph theory to analyze proteins for which three-dimensional crystallographic or NMR structures are available, focusing on the use of the Bron-Kerbosch clique detection algorithm to identify common folding motifs and of the Ullmann subgraph isomorphism algorithm to identify patterns of amino acid residues. Our methods are also applicable to other types of biological macromolecule, such as carbohydrate and nucleic acid structures

CiteSeerX

Crossref

White Rose Research Online

Sussex Research Online

Mining Maximal Cliques from an Uncertain Graph

Author: Mukherjee Arko Provo
Tirthapura Srikanta
Xu Pan
Publication venue
Publication date: 22/10/2014
Field of study

We consider mining dense substructures (maximal cliques) from an uncertain graph, which is a probability distribution on a set of deterministic graphs. For parameter 0 < {\alpha} < 1, we present a precise definition of an {\alpha}-maximal clique in an uncertain graph. We present matching upper and lower bounds on the number of {\alpha}-maximal cliques possible within an uncertain graph. We present an algorithm to enumerate {\alpha}-maximal cliques in an uncertain graph whose worst-case runtime is near-optimal, and an experimental evaluation showing the practical utility of the algorithm.Comment: ICDE 201

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

CiteSeerX

Growing Graphs with Hyperedge Replacement Graph Grammars

Author: Aguinaga S.
Heaps H. S.
Koller D.
Kukluk J. P.
Sadava D. E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/08/2016
Field of study

Discovering the underlying structures present in large real world graphs is a fundamental scientific problem. In this paper we show that a graph's clique tree can be used to extract a hyperedge replacement grammar. If we store an ordering from the extraction process, the extracted graph grammar is guaranteed to generate an isomorphic copy of the original graph. Or, a stochastic application of the graph grammar rules can be used to quickly create random graphs. In experiments on large real world networks, we show that random graphs, generated from extracted graph grammars, exhibit a wide range of properties that are very similar to the original graphs. In addition to graph properties like degree or eigenvector centrality, what a graph "looks like" ultimately depends on small details in local graph substructures that are difficult to define at a global level. We show that our generative graph model is able to preserve these local substructures when generating new graphs and performs well on new and difficult tests of model robustness.Comment: 18 pages, 19 figures, accepted to CIKM 2016 in Indianapolis, I

arXiv.org e-Print Archive

Crossref

Fine-grained Search Space Classification for Hard Enumeration Variants of Subset Problems

Author: Dutta Sourav
Lauri Juho
Publication venue
Publication date: 22/02/2019
Field of study

We propose a simple, powerful, and flexible machine learning framework for (i) reducing the search space of computationally difficult enumeration variants of subset problems and (ii) augmenting existing state-of-the-art solvers with informative cues arising from the input distribution. We instantiate our framework for the problem of listing all maximum cliques in a graph, a central problem in network analysis, data mining, and computational biology. We demonstrate the practicality of our approach on real-world networks with millions of vertices and edges by not only retaining all optimal solutions, but also aggressively pruning the input instance size resulting in several fold speedups of state-of-the-art algorithms. Finally, we explore the limits of scalability and robustness of our proposed framework, suggesting that supervised learning is viable for tackling NP-hard problems in practice.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications