8,849 research outputs found
Maximum common subgraph isomorphism algorithms for the matching of chemical structures
The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinformatics tasks
Significant Subgraph Mining with Multiple Testing Correction
The problem of finding itemsets that are statistically significantly enriched
in a class of transactions is complicated by the need to correct for multiple
hypothesis testing. Pruning untestable hypotheses was recently proposed as a
strategy for this task of significant itemset mining. It was shown to lead to
greater statistical power, the discovery of more truly significant itemsets,
than the standard Bonferroni correction on real-world datasets. An open
question, however, is whether this strategy of excluding untestable hypotheses
also leads to greater statistical power in subgraph mining, in which the number
of hypotheses is much larger than in itemset mining. Here we answer this
question by an empirical investigation on eight popular graph benchmark
datasets. We propose a new efficient search strategy, which always returns the
same solution as the state-of-the-art approach and is approximately two orders
of magnitude faster. Moreover, we exploit the dependence between subgraphs by
considering the effective number of tests and thereby further increase the
statistical power.Comment: 18 pages, 5 figure, accepted to the 2015 SIAM International
Conference on Data Mining (SDM15
Activity recognition from videos with parallel hypergraph matching on GPUs
In this paper, we propose a method for activity recognition from videos based
on sparse local features and hypergraph matching. We benefit from special
properties of the temporal domain in the data to derive a sequential and fast
graph matching algorithm for GPUs.
Traditionally, graphs and hypergraphs are frequently used to recognize
complex and often non-rigid patterns in computer vision, either through graph
matching or point-set matching with graphs. Most formulations resort to the
minimization of a difficult discrete energy function mixing geometric or
structural terms with data attached terms involving appearance features.
Traditional methods solve this minimization problem approximately, for instance
with spectral techniques.
In this work, instead of solving the problem approximatively, the exact
solution for the optimal assignment is calculated in parallel on GPUs. The
graphical structure is simplified and regularized, which allows to derive an
efficient recursive minimization algorithm. The algorithm distributes
subproblems over the calculation units of a GPU, which solves them in parallel,
allowing the system to run faster than real-time on medium-end GPUs
Solving Maximum Clique Problem for Protein Structure Similarity
A basic assumption of molecular biology is that proteins sharing close
three-dimensional (3D) structures are likely to share a common function and in
most cases derive from a same ancestor. Computing the similarity between two
protein structures is therefore a crucial task and has been extensively
investigated. Evaluating the similarity of two proteins can be done by finding
an optimal one-to-one matching between their components, which is equivalent to
identifying a maximum weighted clique in a specific "alignment graph". In this
paper we present a new integer programming formulation for solving such clique
problems. The model has been implemented using the ILOG CPLEX Callable Library.
In addition, we designed a dedicated branch and bound algorithm for solving the
maximum cardinality clique problem. Both approaches have been integrated in
VAST (Vector Alignment Search Tool) - a software for aligning protein 3D
structures largely used in NCBI (National Center for Biotechnology
Information). The original VAST clique solver uses the well known Bron and
Kerbosh algorithm (BK). Our computational results on real life protein
alignment instances show that our branch and bound algorithm is up to 116 times
faster than BK for the largest proteins
- …