29,351 research outputs found

    Performance study of Association Rule Mining Algorithms for Dyeing Processing System

    Get PDF
    InĀ data mining, association rule mining is a popular and well researched area for discovering interesting relations between variables in large databases. In this paper, we compare the performance of association rule mining algorithms, which describes the different issues of mining process.Ā  A distinct feature of these algorithms is that it has a very limited and precisely predictable main memory cost and runs very quickly in memory-based settings. Moreover, it can be scaled up to very large databases using database partitioning. When the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the mining process. Ā These association rule mining algorithms were implemented using Weka Library with Java language. The database used in the development of processes contains series of transactions or event logs belonging to a dyeing unit. This paper contributes to analyze the coloring process of dyeing unit using association rule mining algorithms using frequent patterns.Ā  These frequent patterns have a confidence for different treatments of the dyeing process.Ā  These confidences help the dyeing unit expert called dyer to predict better combination or association of treatments. Ā Therefore, this article also proposes to implement association rule mining algorithms to the dyeing process of dyeing unit, which may have a major impact on the coloring process of dyeing industry to process their colors effectively without any dyeing problems, such as shading, dark spots on the colored yarn and etc. This article shows that LinkRuleMiner (LRM) has an excellent performance for various kinds of data to create frequent patterns, outperforms currently available algorithms in dyeing processing systems, and is highly scalable to mining large databases. Ā This paper shows that HMine and LRM has an excellent performance for various kinds of data, outperforms currently available algorithms in different settings, and is highly scalable to mining large databases. These studies have major impact on the future development of efficient and scalable data mining methods.Keywords: Performance, predictable, main memory, large databases, partitioning, Weka Library

    Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks

    Get PDF
    In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classiļ¬cation rule induction, parallelisation of classiļ¬cation rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classiļ¬cation rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classiļ¬cation rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classiļ¬cation rule induction, parallelisation of classiļ¬cation rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classiļ¬cation rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classiļ¬cation rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach

    HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database Representation Approach

    Full text link
    In this paper we present a novel hybrid (arraybased layout and vertical bitmap layout) database representation approach for mining complete Maximal Frequent Itemset (MFI) on sparse and large datasets. Our work is novel in terms of scalability, item search order and two horizontal and vertical projection techniques. We also present a maximal algorithm using this hybrid database representation approach. Different experimental results on real and sparse benchmark datasets show that our approach is better than previous state of art maximal algorithms.Comment: 8 Pages In the proceedings of 9th IEEE-INMIC 2005, Karachi, Pakistan, 200

    A Survey of Parallel Data Mining

    Get PDF
    With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms
    • ā€¦
    corecore