4 research outputs found

    Parallel FIM Approach on GPU using OpenCL

    Get PDF
    In this paper, we describe GPU-Eclat algorithm, a GPU (General Purpose Graphics Processing Unit) enhanced implementation of Frequent Item set Mining (FIM). The frequent itemsets are extracted from a transactional database as it is a essential assignment in data mining field because of its broad applications in mining association rules, time series, correlations etc. The Eclat approach is the typically generate-and-check approach to obtain frequent itemsets from a database with a given minimum support threshold value. OpenCL is a platform independent Open Computing Language for GPU computation. We tested our implementation with an Radeon Dual graphic processor and determine up to 68X speedup as compared with sequential Eclat algorithm on a CPU. In order to map the Eclat algorithm onto the SIMD (Single Instruction Multiple Data) execution model, an array data structure is used to represent the input database and standard dataset is converted to the vertical data layout. In our implementation, we perform a parallelized version of the candidate generation and support counting phases on the GPU. Experimental results show that GPU-Eclat consistently outperforms CPU-based Eclat implementations. Our results reveal the potential for GPGPUs in speeding up data mining algorithms

    An Efficient Load Balancing Multi-core Frequent Patterns Mining Algorithm

    Get PDF
    Abstract-Mining frequent pattern from transactional database is an important problem in data mining. Many methods have been proposed to solve this problem. However, the computation time still increase significantly while the data size grows. Therefore, parallel computing is a good strategy to solve this problem. Researchers have proposed various parallel and distributed algorithms on cluster system, grid system. However, the construction and maintenance cost is pretty high. In this paper, a multi-core load balancing frequent pattern mining algorithm is presented. The main goal of the proposed algorithm is to reduce the massive duplicated candidates generated in previous method. In order to verify the performance, we also implemented the proposed algorithm as well as previous methods for comparison. The experimental results showed that our method could reduce the computation time dramatically with more threads. Moreover, we could observe that the workload was equally dispatched to each computing unit

    Detection of illicit behaviours and mining for contrast patterns

    Get PDF
    This thesis describes a set of novel algorithms and models designed to detect illicit behaviour. This includes development of domain specific solutions, focusing on anti-money laundering and detection of opinion spam. In addition, advancements are presented for the mining and application of contrast patterns, which are a useful tool for characterising illicit behaviour. For anti-money laundering, this thesis presents a novel approach for detection based on analysis of financial networks and supervised learning. This includes the development of a network model, features extracted from this model, and evaluation of classifiers trained using real financial data. Results indicate that this approach successfully identifies suspicious groups whose collaborative behaviour is indicative of money laundering. For the detection of opinion spam, this thesis presents a model of reviewer behaviour and a method for detection based on statistical anomaly detection. This method considers review ratings, and does not rely on text-based features. Evaluation using real data shows that spammers are successfully identified. Comparison with existing methods shows a small improvement in accuracy, but significant improvements in computational efficiency. This thesis also considers the application of contrast patterns to network analysis and presents a novel algorithm for mining contrast patterns in a distributed system. Contrast patterns may be used to characterise illicit behaviour by contrasting illicit and non-illicit behaviour and uncovering significant differences. However, existing mining algorithms are limited by serial processing making them unsuitable for large data sets. This thesis advances the current state-of-the-art, describing an algorithm for mining in parallel. This algorithm is evaluated using real data and is shown to achieve a high level of scalability, allowing mining of large, high-dimensional data sets. In addition, this thesis explores methods for mapping network features to an item-space suitable for analysis using contrast patterns. Experiments indicate that contrast patterns may become a valuable tool for network analysis
    corecore