384 research outputs found
Shared Memory Parallel Subgraph Enumeration
The subgraph enumeration problem asks us to find all subgraphs of a target
graph that are isomorphic to a given pattern graph. Determining whether even
one such isomorphic subgraph exists is NP-complete---and therefore finding all
such subgraphs (if they exist) is a time-consuming task. Subgraph enumeration
has applications in many fields, including biochemistry and social networks,
and interestingly the fastest algorithms for solving the problem for
biochemical inputs are sequential. Since they depend on depth-first tree
traversal, an efficient parallelization is far from trivial. Nevertheless,
since important applications produce data sets with increasing difficulty,
parallelism seems beneficial.
We thus present here a shared-memory parallelization of the state-of-the-art
subgraph enumeration algorithms RI and RI-DS (a variant of RI for dense graphs)
by Bonnici et al. [BMC Bioinformatics, 2013]. Our strategy uses work stealing
and our implementation demonstrates a significant speedup on real-world
biochemical data---despite a highly irregular data access pattern. We also
improve RI-DS by pruning the search space better; this further improves the
empirical running times compared to the already highly tuned RI-DS.Comment: 18 pages, 12 figures, To appear at the 7th IEEE Workshop on Parallel
/ Distributed Computing and Optimization (PDCO 2017
Recommended from our members
Limited-memory warping LCSS for real-time low-power pattern recognition in wireless nodes
We present and evaluate a microcontroller-optimized limited-memory implementation of a Warping Longest Common Subsequence algorithm (WarpingLCSS). It permits to spot patterns within noisy sensor data in real-time in resource constrained sensor nodes. It allows variability in the sensed system dynamics through warping; it uses only integer operations; it can be applied to various sensor modalities; and it is suitable for embedded training to recognize new patterns. We illustrate the method on 3 applications from wearable sensing and activity recognition using 3 sensor modalities: spotting the QRS complex in ECG, recognizing gestures in everyday life, and analyzing beach volleyball. We implemented the system on a low-power 8-bit AVR wireless node and a 32-bit ARM Cortex M4 microcontroller. Up to 67 or 140 10-second gestures can be recognized simultaneously in real-time from a 10Hz motion sensor on the AVR and M4 using 8mW and 10mW respectively. A single gesture spotter uses as few as 135ÎĽW on the AVR. The method allows low data rate distributed in-network recognition and we show a 100 fold data rate reduction in a complex activity recognition scenario. The versatility and low complexity of the method makes it well suited as a generic pattern recognition method and could be implemented as part of sensor front-ends
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Sequence Similarity between Genetic Codes using Improved Longest Common Subsequence Algorithm
Finding the sequence similarity between two genetic codes is an important problem in computational biology. In this paper, we developed an efficient algorithm to find sequence similarity between genetic codes using longest common subsequence algorithm. The algorithm takes the advantages over the edit distance algorithm and improves the performance. The proposed algorithm is tested on randomly generated DNA sequence and finding the exact DNA sequence comparison. The DNA genetic code sequence comparison can be used to discover information such as evolutionary divergence and ways to apply genetic codes from one DNA sequence to another sequence
TemporalRI: subgraph isomorphism in temporal networks with multiple contacts
AbstractTemporal networks are graphs where each edge is associated with a timestamp denoting when two nodes interact. Temporal Subgraph Isomorphism (TSI) aims at retrieving all the subgraphs of a temporal network (called target) matching a smaller temporal network (called query), such that matched target edges appear in the same chronological order of corresponding query edges. Few algorithms have been proposed to solve the TSI problem (or variants of it) and most of them are applicable only to small or specific queries. In this paper we present TemporalRI, a new subgraph isomorphism algorithm for temporal networks with multiple contacts between nodes, which is inspired by RI algorithm. TemporalRI introduces the notion of temporal flows and uses them to filter the search space of candidate nodes for the matching. Our algorithm can handle queries of any size and any topology. Experiments on real networks of different sizes show that TemporalRI is very efficient compared to the state-of-the-art, especially for large queries and targets
Constraint-based sequence mining using constraint programming
The goal of constraint-based sequence mining is to find sequences of symbols
that are included in a large number of input sequences and that satisfy some
constraints specified by the user. Many constraints have been proposed in the
literature, but a general framework is still missing. We investigate the use of
constraint programming as general framework for this task. We first identify
four categories of constraints that are applicable to sequence mining. We then
propose two constraint programming formulations. The first formulation
introduces a new global constraint called exists-embedding. This formulation is
the most efficient but does not support one type of constraint. To support such
constraints, we develop a second formulation that is more general but incurs
more overhead. Both formulations can use the projected database technique used
in specialised algorithms. Experiments demonstrate the flexibility towards
constraint-based settings and compare the approach to existing methods.Comment: In Integration of AI and OR Techniques in Constraint Programming
(CPAIOR), 201
Symmetric continuous subgraph matching with bidirectional dynamic programming
6sìopenIn many real datasets such as social media streams and cyber data sources, graphs change over time through a graph update stream of edge insertions and deletions. Detecting critical patterns in such dynamic graphs plays an important role in various application domains such as fraud detection, cyber security, and recommendation systems for social networks. Given a dynamic data graph and a query graph, the continuous subgraph matching problem is to find all positive matches for each edge insertion and all negative matches for each edge deletion. The state-of-the-art algorithm TurboFlux uses a spanning tree of a query graph for filtering. However, using the spanning tree may have a low pruning power because it does not take into account all edges of the query graph. In this paper, we present a symmetric and much faster algorithm SymBi which maintains an auxiliary data structure based on a directed acyclic graph instead of a spanning tree, which maintains the intermediate results of bidirectional dynamic programming between the query graph and the dynamic graph. Extensive experiments with real and synthetic datasets show that SymBi outperforms the state-of-the-art algorithm by up to three orders of magnitude in terms of the elapsed time.openAccademicoMin S.; Park S.G.; Park K.; Giammarresi D.; Italiano G.F.; Han W.-S.Min, S.; Park, S. G.; Park, K.; Giammarresi, D.; Italiano, G. F.; Han, W. -S
- …