Search CORE

29,351 research outputs found

Performance study of Association Rule Mining Algorithms for Dyeing Processing System

Author: M.S Saravanan
Sree .R.J Rama
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 20/10/2011
Field of study

In data mining, association rule mining is a popular and well researched area for discovering interesting relations between variables in large databases. In this paper, we compare the performance of association rule mining algorithms, which describes the different issues of mining process. A distinct feature of these algorithms is that it has a very limited and precisely predictable main memory cost and runs very quickly in memory-based settings. Moreover, it can be scaled up to very large databases using database partitioning. When the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the mining process. These association rule mining algorithms were implemented using Weka Library with Java language. The database used in the development of processes contains series of transactions or event logs belonging to a dyeing unit. This paper contributes to analyze the coloring process of dyeing unit using association rule mining algorithms using frequent patterns. These frequent patterns have a confidence for different treatments of the dyeing process. These confidences help the dyeing unit expert called dyer to predict better combination or association of treatments. Therefore, this article also proposes to implement association rule mining algorithms to the dyeing process of dyeing unit, which may have a major impact on the coloring process of dyeing industry to process their colors effectively without any dyeing problems, such as shading, dark spots on the colored yarn and etc. This article shows that LinkRuleMiner (LRM) has an excellent performance for various kinds of data to create frequent patterns, outperforms currently available algorithms in dyeing processing systems, and is highly scalable to mining large databases. This paper shows that HMine and LRM has an excellent performance for various kinds of data, outperforms currently available algorithms in different settings, and is highly scalable to mining large databases. These studies have major impact on the future development of efficient and scalable data mining methods.Keywords: Performance, predictable, main memory, large databases, partitioning, Weka Library

International Institute for Science, Technology and Education (IISTE): E-Journals

Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks

Author: Berrar
Bramer
Bramer
Bramer
Bramer
Bramer
Cendrowska
Cohen
Corkill
Frederic Stahl
Hennessy
Hunt
Hwang
Jiang
Max Bramer
Michalski
Mutlu
Nolle
Pham
Provost
Quinlan
Quinlan
Smyth
Stahl
Stahl
Stahl
Stahl
Szalay
Witten
Xavier
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classiﬁcation rule induction, parallelisation of classiﬁcation rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classiﬁcation rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classiﬁcation rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classiﬁcation rule induction, parallelisation of classiﬁcation rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classiﬁcation rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classiﬁcation rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach

Central Archive at the University of Reading

Crossref

Bournemouth University Research Online

HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database Representation Approach

Author: Baig Abdul Rauf
Bashir Shariq
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/04/2009
Field of study

In this paper we present a novel hybrid (arraybased layout and vertical bitmap layout) database representation approach for mining complete Maximal Frequent Itemset (MFI) on sparse and large datasets. Our work is novel in terms of scalability, item search order and two horizontal and vertical projection techniques. We also present a maximal algorithm using this hybrid database representation approach. Different experimental results on real and sparse benchmark datasets show that our approach is better than previous state of art maximal algorithms.Comment: 8 Pages In the proceedings of 9th IEEE-INMIC 2005, Karachi, Pakistan, 200

arXiv.org e-Print Archive

Crossref

Next challenges for adaptive learning systems

Author: Bifet A.
Gaber M.
Gabrys B.
Gama J.
Minku L.
Musial K.
Zliobaite I.
Publication venue
Publication date: 01/01/2012
Field of study

University of Birmingham Research Portal

Portsmouth University Research Portal (Pure)

A Survey of Parallel Data Mining

Author: Freitas Alex A.
Publication venue
Publication date
Field of study

With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

Kent Academic Repository