290 research outputs found
Maximum common subgraph isomorphism algorithms for the matching of chemical structures
The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinformatics tasks
RASCAL: calculation of graph similarity using maximum common edge subgraphs
A new graph similarity calculation procedure is introduced for comparing labeled graphs. Given a minimum similarity threshold, the procedure consists of an initial screening process to determine whether it is possible for the measure of similarity between the two graphs to exceed the minimum threshold, followed by a rigorous maximum common edge subgraph (MCES) detection algorithm to compute the exact degree and composition of similarity. The proposed MCES algorithm is based on a maximum clique formulation of the problem and is a significant improvement over other published algorithms. It presents new approaches to both lower and upper bounding as well as vertex selection
A Much Faster Algorithm for Finding a Maximum Clique
We present improvements to a branch-and-bound maximumclique-finding algorithm MCS (WALCOM 2010, LNCS 5942, pp. 191–203) that was shown to be fast. First, we employ an efficient approximation algorithm for finding a maximum clique. Second, we make use of appropriate sorting of vertices only near the root of the search tree. Third, we employ a lightened approximate coloring mainly near the leaves of the search tree. A new algorithm obtained from MCS with the above improvements is named MCT. It is shown that MCT is much faster than MCS by extensive computational experiments. In particular, MCT is shown to be faster than MCS for gen400 p0.9 75 and gen400 p0.9 65 by over 328,000 and 77,000 times, respectively
Scalable Kernelization for Maximum Independent Sets
The most efficient algorithms for finding maximum independent sets in both
theory and practice use reduction rules to obtain a much smaller problem
instance called a kernel. The kernel can then be solved quickly using exact or
heuristic algorithms---or by repeatedly kernelizing recursively in the
branch-and-reduce paradigm. It is of critical importance for these algorithms
that kernelization is fast and returns a small kernel. Current algorithms are
either slow but produce a small kernel, or fast and give a large kernel. We
attempt to accomplish both of these goals simultaneously, by giving an
efficient parallel kernelization algorithm based on graph partitioning and
parallel bipartite maximum matching. We combine our parallelization techniques
with two techniques to accelerate kernelization further: dependency checking
that prunes reductions that cannot be applied, and reduction tracking that
allows us to stop kernelization when reductions become less fruitful. Our
algorithm produces kernels that are orders of magnitude smaller than the
fastest kernelization methods, while having a similar execution time.
Furthermore, our algorithm is able to compute kernels with size comparable to
the smallest known kernels, but up to two orders of magnitude faster than
previously possible. Finally, we show that our kernelization algorithm can be
used to accelerate existing state-of-the-art heuristic algorithms, allowing us
to find larger independent sets faster on large real-world networks and
synthetic instances.Comment: Extended versio
Efficient Algorithms for Finding Maximum and Maximal Cliques and Their Applications
The problem of finding a maximum clique or enumerating all maximal cliques is very important and has been explored in several excellent survey papers. Here, we focus our attention on the step-by-step examination of a series of branch-and-bound depth-first search algorithms: Basics, MCQ, MCR, MCS, and MCT. Subsequently, as with the depth-first search as above, we present our algorithm, CLIQUES, for enumerating all maximal cliques. Finally, we describe some of the applications of the algorithms and their variants in bioinformatics, data mining, and other fields
Finding Near-Optimal Independent Sets at Scale
The independent set problem is NP-hard and particularly difficult to solve in
large sparse graphs. In this work, we develop an advanced evolutionary
algorithm, which incorporates kernelization techniques to compute large
independent sets in huge sparse networks. A recent exact algorithm has shown
that large networks can be solved exactly by employing a branch-and-reduce
technique that recursively kernelizes the graph and performs branching.
However, one major drawback of their algorithm is that, for huge graphs,
branching still can take exponential time. To avoid this problem, we
recursively choose vertices that are likely to be in a large independent set
(using an evolutionary approach), then further kernelize the graph. We show
that identifying and removing vertices likely to be in large independent sets
opens up the reduction space---which not only speeds up the computation of
large independent sets drastically, but also enables us to compute high-quality
independent sets on much larger instances than previously reported in the
literature.Comment: 17 pages, 1 figure, 8 tables. arXiv admin note: text overlap with
arXiv:1502.0168
Detecting High Log-Densities -- an O(n^1/4) Approximation for Densest k-Subgraph
In the Densest k-Subgraph problem, given a graph G and a parameter k, one
needs to find a subgraph of G induced on k vertices that contains the largest
number of edges. There is a significant gap between the best known upper and
lower bounds for this problem. It is NP-hard, and does not have a PTAS unless
NP has subexponential time algorithms. On the other hand, the current best
known algorithm of Feige, Kortsarz and Peleg, gives an approximation ratio of
n^(1/3-epsilon) for some specific epsilon > 0 (estimated at around 1/60).
We present an algorithm that for every epsilon > 0 approximates the Densest
k-Subgraph problem within a ratio of n^(1/4+epsilon) in time n^O(1/epsilon). In
particular, our algorithm achieves an approximation ratio of O(n^1/4) in time
n^O(log n). Our algorithm is inspired by studying an average-case version of
the problem where the goal is to distinguish random graphs from graphs with
planted dense subgraphs. The approximation ratio we achieve for the general
case matches the distinguishing ratio we obtain for this planted problem.
At a high level, our algorithms involve cleverly counting appropriately
defined trees of constant size in G, and using these counts to identify the
vertices of the dense subgraph. Our algorithm is based on the following
principle. We say that a graph G(V,E) has log-density alpha if its average
degree is Theta(|V|^alpha). The algorithmic core of our result is a family of
algorithms that output k-subgraphs of nontrivial density whenever the
log-density of the densest k-subgraph is larger than the log-density of the
host graph.Comment: 23 page
- …