21,179 research outputs found
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Efficient mining of discriminative molecular fragments
Frequent pattern discovery in structured data is receiving
an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset
Semi-Trusted Mixer Based Privacy Preserving Distributed Data Mining for Resource Constrained Devices
In this paper a homomorphic privacy preserving association rule mining
algorithm is proposed which can be deployed in resource constrained devices
(RCD). Privacy preserved exchange of counts of itemsets among distributed
mining sites is a vital part in association rule mining process. Existing
cryptography based privacy preserving solutions consume lot of computation due
to complex mathematical equations involved. Therefore less computation involved
privacy solutions are extremely necessary to deploy mining applications in RCD.
In this algorithm, a semi-trusted mixer is used to unify the counts of itemsets
encrypted by all mining sites without revealing individual values. The proposed
algorithm is built on with a well known communication efficient association
rule mining algorithm named count distribution (CD). Security proofs along with
performance analysis and comparison show the well acceptability and
effectiveness of the proposed algorithm. Efficient and straightforward privacy
model and satisfactory performance of the protocol promote itself among one of
the initiatives in deploying data mining application in RCD.Comment: IEEE Publication format, International Journal of Computer Science
and Information Security, IJCSIS, Vol. 8 No. 1, April 2010, USA. ISSN 1947
5500, http://sites.google.com/site/ijcsis
High performance subgraph mining in molecular compounds
Structured data represented in the form of graphs arises in
several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining
problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main
aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing
algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network
of workstations
- …