982 research outputs found
Solving Maximum Clique Problem for Protein Structure Similarity
A basic assumption of molecular biology is that proteins sharing close
three-dimensional (3D) structures are likely to share a common function and in
most cases derive from a same ancestor. Computing the similarity between two
protein structures is therefore a crucial task and has been extensively
investigated. Evaluating the similarity of two proteins can be done by finding
an optimal one-to-one matching between their components, which is equivalent to
identifying a maximum weighted clique in a specific "alignment graph". In this
paper we present a new integer programming formulation for solving such clique
problems. The model has been implemented using the ILOG CPLEX Callable Library.
In addition, we designed a dedicated branch and bound algorithm for solving the
maximum cardinality clique problem. Both approaches have been integrated in
VAST (Vector Alignment Search Tool) - a software for aligning protein 3D
structures largely used in NCBI (National Center for Biotechnology
Information). The original VAST clique solver uses the well known Bron and
Kerbosh algorithm (BK). Our computational results on real life protein
alignment instances show that our branch and bound algorithm is up to 116 times
faster than BK for the largest proteins
On combinatorial optimisation in analysis of protein-protein interaction and protein folding networks
Abstract: Protein-protein interaction networks and protein folding networks represent prominent research topics at the intersection of bioinformatics and network science. In this paper, we present a study of these networks from combinatorial optimisation point of view. Using a combination of classical heuristics and stochastic optimisation techniques, we were able to identify several interesting combinatorial properties of biological networks of the COSIN project. We obtained optimal or near-optimal solutions to maximum clique and chromatic number problems for these networks. We also explore patterns of both non-overlapping and overlapping cliques in these networks. Optimal or near-optimal solutions to partitioning of these networks into non-overlapping cliques and to maximum independent set problem were discovered. Maximal cliques are explored by enumerative techniques. Domination in these networks is briefly studied, too. Applications and extensions of our findings are discussed
Structure of conflict graphs in constraint alignment problems and algorithms
We consider the constrained graph alignment problem which has applications in
biological network analysis. Given two input graphs , a pair of vertex mappings induces an {\it edge conservation} if
the vertex pairs are adjacent in their respective graphs. %In general terms The
goal is to provide a one-to-one mapping between the vertices of the input
graphs in order to maximize edge conservation. However the allowed mappings are
restricted since each vertex from (resp. ) is allowed to be mapped
to at most (resp. ) specified vertices in (resp. ). Most
of results in this paper deal with the case which attracted most
attention in the related literature. We formulate the problem as a maximum
independent set problem in a related {\em conflict graph} and investigate
structural properties of this graph in terms of forbidden subgraphs. We are
interested, in particular, in excluding certain wheals, fans, cliques or claws
(all terms are defined in the paper), which corresponds in excluding certain
cycles, paths, cliques or independent sets in the neighborhood of each vertex.
Then, we investigate algorithmic consequences of some of these properties,
which illustrates the potential of this approach and raises new horizons for
further works. In particular this approach allows us to reinterpret a known
polynomial case in terms of conflict graph and to improve known approximation
and fixed-parameter tractability results through efficiently solving the
maximum independent set problem in conflict graphs. Some of our new
approximation results involve approximation ratios that are function of the
optimal value, in particular its square root; this kind of results cannot be
achieved for maximum independent set in general graphs.Comment: 22 pages, 6 figure
Algorithm engineering for optimal alignment of protein structure distance matrices
Protein structural alignment is an important problem in computational
biology. In this paper, we present first successes on provably optimal pairwise
alignment of protein inter-residue distance matrices, using the popular Dali
scoring function. We introduce the structural alignment problem formally, which
enables us to express a variety of scoring functions used in previous work as
special cases in a unified framework. Further, we propose the first
mathematical model for computing optimal structural alignments based on dense
inter-residue distance matrices. We therefore reformulate the problem as a
special graph problem and give a tight integer linear programming model. We
then present algorithm engineering techniques to handle the huge integer linear
programs of real-life distance matrix alignment problems. Applying these
techniques, we can compute provably optimal Dali alignments for the very first
time
Recommended from our members
Quantitative surface field analysis: learning causal models to predict ligand binding affinity and pose.
We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification
Shared-Memory Parallel Maximal Clique Enumeration
We present shared-memory parallel methods for Maximal Clique Enumeration
(MCE) from a graph. MCE is a fundamental and well-studied graph analytics task,
and is a widely used primitive for identifying dense structures in a graph. Due
to its computationally intensive nature, parallel methods are imperative for
dealing with large graphs. However, surprisingly, there do not yet exist
scalable and parallel methods for MCE on a shared-memory parallel machine. In
this work, we present efficient shared-memory parallel algorithms for MCE, with
the following properties: (1) the parallel algorithms are provably
work-efficient relative to a state-of-the-art sequential algorithm (2) the
algorithms have a provably small parallel depth, showing that they can scale to
a large number of processors, and (3) our implementations on a multicore
machine shows a good speedup and scaling behavior with increasing number of
cores, and are substantially faster than prior shared-memory parallel
algorithms for MCE.Comment: 10 pages, 3 figures, proceedings of the 25th IEEE International
Conference on. High Performance Computing, Data, and Analytics (HiPC), 201
- …