103,385 research outputs found
Literature Review on Secure Mining of Association Rules in Horizontally Distributed Databases
Data and knowledge Engineering is one of the area under data mining. Which can extract important knowledge from large database, but sometimes these database are divided among various parties. This paper addresses a fast distributed mining of association rules over horizontally distributed data. This paper presents different methods for secure mining of association rules in horizontally distributed databases. The main aim of this paper is protocol for secure mining of association rules in horizontally distributed databases. The current main protocol is that of Kantarcioglu and Clifton. This protocol, like theirs, is based on the Fast Distributed Mining (FDM) algorithm of Cheung et al., which is an unsecured distributed version of the Apriori algorithm. The main components in this protocol are two novel secure multi-party algorithms — one that computes the union of private subsets that each of the interacting players hold, and another that tests the inclusion of an element held by one player in a subset held by another. This protocol offers improved privacy with respect to the protocol in. In addition, it is simpler and is significantly more efficient in terms of communication rounds, communication cost and computational cost
Recommended from our members
Enhancing association rules algorithms for mining distributed databases. Integration of fast BitTable and multi-agent association rules mining in distributed medical databases for decision support.
Over the past few years, mining data located in heterogeneous and geographically distributed sites have been designated as one of the key important issues. Loading distributed data into centralized location for mining interesting rules is not a good approach. This is because it violates common issues such as data privacy and it imposes network overheads. The situation becomes worse when the network has limited bandwidth which is the case in most of the real time systems. This has prompted the need for intelligent data analysis to discover the hidden information in these huge amounts of distributed databases.
In this research, we present an incremental approach for building an efficient Multi-Agent based algorithm for mining real world databases in geographically distributed sites. First, we propose the Distributed Multi-Agent Association Rules algorithm (DMAAR) to minimize the all-to-all broadcasting between distributed sites. Analytical calculations show that DMAAR reduces the algorithm complexity and minimizes the message communication cost. The proposed Multi-Agent based algorithm complies with the Foundation for Intelligent Physical Agents (FIPA), which is considered as the global standards in communication between agents, thus, enabling the proposed algorithm agents to cooperate with other standard agents.
Second, the BitTable Multi-Agent Association Rules algorithm (BMAAR) is proposed. BMAAR includes an efficient BitTable data structure which helps in compressing the database thus can easily fit into the memory of the local sites. It also includes two BitWise AND/OR operations for quick candidate itemsets generation and support counting. Moreover, the algorithm includes three transaction trimming techniques to reduce the size of the mined data.
Third, we propose the Pruning Multi-Agent Association Rules algorithm (PMAAR) which includes three candidate itemsets pruning techniques for reducing the large number of generated candidate itemsets, consequently, reducing the total time for the mining process.
The proposed PMAAR algorithm has been compared with existing Association Rules algorithms against different benchmark datasets and has proved to have better performance and execution time. Moreover, PMAAR has been implemented on real world distributed medical databases obtained from more than one hospital in Egypt to discover the hidden Association Rules in patientsÂż records to demonstrate the merits and capabilities of the proposed model further. Medical data was anonymously obtained without the patientsÂż personal details. The analysis helped to identify the existence or the absence of the disease based on minimum number of effective examinations and tests. Thus, the proposed algorithm can help in providing accurate medical decisions based on cost effective treatments, improving the medical service for the patients, reducing the real time response for the health system and improving the quality of clinical decision making
Efficient mining of discriminative molecular fragments
Frequent pattern discovery in structured data is receiving
an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
- …