4 research outputs found
Parallel FIM Approach on GPU using OpenCL
In this paper, we describe GPU-Eclat algorithm, a GPU (General Purpose Graphics Processing Unit) enhanced implementation of Frequent Item set Mining (FIM). The frequent itemsets are extracted from a transactional database as it is a essential assignment in data mining field because of its broad applications in mining association rules, time series, correlations etc. The Eclat approach is the typically generate-and-check approach to obtain frequent itemsets from a database with a given minimum support threshold value. OpenCL is a platform independent Open Computing Language for GPU computation. We tested our implementation with an Radeon Dual graphic processor and determine up to 68X speedup as compared with sequential Eclat algorithm on a CPU. In order to map the Eclat algorithm onto the SIMD (Single Instruction Multiple Data) execution model, an array data structure is used to represent the input database and standard dataset is converted to the vertical data layout. In our implementation, we perform a parallelized version of the candidate generation and support counting phases on the GPU. Experimental results show that GPU-Eclat consistently outperforms CPU-based Eclat implementations. Our results reveal the potential for GPGPUs in speeding up data mining algorithms
An Efficient Load Balancing Multi-core Frequent Patterns Mining Algorithm
Abstract-Mining frequent pattern from transactional database is an important problem in data mining. Many methods have been proposed to solve this problem. However, the computation time still increase significantly while the data size grows. Therefore, parallel computing is a good strategy to solve this problem. Researchers have proposed various parallel and distributed algorithms on cluster system, grid system. However, the construction and maintenance cost is pretty high. In this paper, a multi-core load balancing frequent pattern mining algorithm is presented. The main goal of the proposed algorithm is to reduce the massive duplicated candidates generated in previous method. In order to verify the performance, we also implemented the proposed algorithm as well as previous methods for comparison. The experimental results showed that our method could reduce the computation time dramatically with more threads. Moreover, we could observe that the workload was equally dispatched to each computing unit
Recommended from our members
Enhancing association rules algorithms for mining distributed databases. Integration of fast BitTable and multi-agent association rules mining in distributed medical databases for decision support.
Over the past few years, mining data located in heterogeneous and geographically distributed sites have been designated as one of the key important issues. Loading distributed data into centralized location for mining interesting rules is not a good approach. This is because it violates common issues such as data privacy and it imposes network overheads. The situation becomes worse when the network has limited bandwidth which is the case in most of the real time systems. This has prompted the need for intelligent data analysis to discover the hidden information in these huge amounts of distributed databases.
In this research, we present an incremental approach for building an efficient Multi-Agent based algorithm for mining real world databases in geographically distributed sites. First, we propose the Distributed Multi-Agent Association Rules algorithm (DMAAR) to minimize the all-to-all broadcasting between distributed sites. Analytical calculations show that DMAAR reduces the algorithm complexity and minimizes the message communication cost. The proposed Multi-Agent based algorithm complies with the Foundation for Intelligent Physical Agents (FIPA), which is considered as the global standards in communication between agents, thus, enabling the proposed algorithm agents to cooperate with other standard agents.
Second, the BitTable Multi-Agent Association Rules algorithm (BMAAR) is proposed. BMAAR includes an efficient BitTable data structure which helps in compressing the database thus can easily fit into the memory of the local sites. It also includes two BitWise AND/OR operations for quick candidate itemsets generation and support counting. Moreover, the algorithm includes three transaction trimming techniques to reduce the size of the mined data.
Third, we propose the Pruning Multi-Agent Association Rules algorithm (PMAAR) which includes three candidate itemsets pruning techniques for reducing the large number of generated candidate itemsets, consequently, reducing the total time for the mining process.
The proposed PMAAR algorithm has been compared with existing Association Rules algorithms against different benchmark datasets and has proved to have better performance and execution time. Moreover, PMAAR has been implemented on real world distributed medical databases obtained from more than one hospital in Egypt to discover the hidden Association Rules in patients¿ records to demonstrate the merits and capabilities of the proposed model further. Medical data was anonymously obtained without the patients¿ personal details. The analysis helped to identify the existence or the absence of the disease based on minimum number of effective examinations and tests. Thus, the proposed algorithm can help in providing accurate medical decisions based on cost effective treatments, improving the medical service for the patients, reducing the real time response for the health system and improving the quality of clinical decision making
Detection of illicit behaviours and mining for contrast patterns
This thesis describes a set of novel algorithms and models designed to detect illicit behaviour. This includes development of domain specific solutions, focusing on anti-money laundering and detection of opinion spam. In addition, advancements are presented for the mining and application of contrast patterns, which are a useful tool for characterising illicit behaviour. For anti-money laundering, this thesis presents a novel approach for detection based on analysis of financial networks and supervised learning. This includes the development of a network model, features extracted from this model, and evaluation of classifiers trained using real financial data. Results indicate that this approach successfully identifies suspicious groups whose collaborative behaviour is indicative of money laundering. For the detection of opinion spam, this thesis presents a model of reviewer behaviour and a method for detection based on statistical anomaly detection. This method considers review ratings, and does not rely on text-based features. Evaluation using real data shows that spammers are successfully identified. Comparison with existing methods shows a small improvement in accuracy, but significant improvements in computational efficiency. This thesis also considers the application of contrast patterns to network analysis and presents a novel algorithm for mining contrast patterns in a distributed system. Contrast patterns may be used to characterise illicit behaviour by contrasting illicit and non-illicit behaviour and uncovering significant differences. However, existing mining algorithms are limited by serial processing making them unsuitable for large data sets. This thesis advances the current state-of-the-art, describing an algorithm for mining in parallel. This algorithm is evaluated using real data and is shown to achieve a high level of scalability, allowing mining of large, high-dimensional data sets. In addition, this thesis explores methods for mapping network features to an item-space suitable for analysis using contrast patterns. Experiments indicate that contrast patterns may become a valuable tool for network analysis