13 research outputs found
A Hash Based Frequent Item set Mining using Rehashing
Data mining is the use of automated data analysis techniques to uncover previously undetected relationships among data items. Mining frequent item sets is one of the most important concepts of data mining. Frequent item set mining has been a highly concerned field of data mining for researcher for over two decades. It plays an essential role in many data mining tasks that try to find interesting itemsets from databases, such as association rules, correlations, sequences, classifiers and clusters . In this paper, we propose a new association rule mining algorithm called Rehashing Based Frequent Item set (RBFI) in which hashing technology is used to store the database in vertical data format. To avoid hash collision and secondary clustering problem in hashing, rehashing technique is utilized here. The advantages of this new hashing technique are easy to compute the hash function, fast access of data and efficiency. This algorithm provides facilities to avoid unnecessary scans to the database
Frequent Itemset Generation using Double Hashing Technique
AbstractIn data mining, frequent itemsets plays an important role which is used to identify the correlations among the fields of database.In this paper, we propose a new association rule mining algorithm called Double Hashing Based Frequent Itemsets, (DHBFI) in which hashing technology is used to store the database in vertical data format. This double hashing technique is mainly preferred for avoiding the major issues of hash collision and secondary clustering problem in frequent itemset generation. Hence this proposed hashing technique makes the computation easier, faster and more efficient.Also this algorithm eliminates unnecessary redundant scans in the database and candidate itemset generation which leads to less space and time complexity
New Spark solutions for distributed frequent itemset and association rule mining algorithms
Funding for open access publishing: Universidad de Gran-
ada/CBUA. The research reported in this paper was partially sup-
ported by the BIGDATAMED project, which has received funding
from the Andalusian Government (Junta de Andalucı Ìa) under grant
agreement No P18-RT-1765, by Grants PID2021-123960OB-I00 and
Grant TED2021-129402B-C21 funded by Ministerio de Ciencia e
Innovacio Ìn and, by ERDF A way of making Europe and by the
European Union NextGenerationEU. In addition, this work has been
partially supported by the Ministry of Universities through the EU-
funded Margarita Salas programme NextGenerationEU. Funding for
open access charge: Universidad de Granada/CBUAThe large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with
massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information
in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous
phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major
problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for
frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive
computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the
existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform
which has been demonstrated to outperform existing distributive algorithmic implementations.Universidad de Granada/CBUAJunta de Andalucia
P18-RT-1765Ministry of Science and Innovation, Spain (MICINN)
Instituto de Salud Carlos III
Spanish Government
PID2021-123960OB-I00,
TED2021-129402B-C21ERDF A way of making EuropeEuropean Union NextGenerationEUMinistry of Universities through the E
Incremental association rule mining based on matrix compression for edge computing
A growing amount of data is being generated, communicated and processed at the edge nodes of cloud systems; this has the potential to improve response times and thus reduce communication bandwidth. We found that traditional static association rule mining cannot solve certain real-world problems with dynamically changing data. Incremental association rule mining algorithms have been studied. This paper combines the fast update pruning (FUP) algorithm with a compressed Boolean matrix and proposes a new incremental association rule mining algorithm, named the FUP algorithm based on a compression matrix (FBCM). This algorithm requires only a single scan of both the database and incremental databases, establishes two compressible Boolean matrices, and applies association rule mining to those matrices. The FBCM algorithm effectively improves the computational efficiency of incremental association rule mining and hence is suitable for knowledge discovery in the edge nodes of cloud systems
ARM-AMO: An Efficient Association Rule Mining Algorithm Based on Animal Migration Optimization
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI linkAssociation rule mining (ARM) aims to find out association rules that satisfy predefined minimum support and confidence from a given database. However, in many cases ARM generates extremely large number of association rules, which are impossible for end users to comprehend or validate, thereby limiting the usefulness of data mining results. In this paper,
we propose a new mining algorithm based on Animal Migration Optimization (AMO), called
ARM-AMO, to reduce the number of association rules. It is based on the idea that rules which
are not of high support and unnecessary are deleted from the data. Firstly, Apriori algorithm is
applied to generate frequent itemsets and association rules. Then, AMO is used to reduce the
number of association rules with a new fitness function that incorporates frequent rules. It is
observed from the experiments that, in comparison with the other relevant techniques, ARM-AMO greatly reduces the computational time for frequent item set generation, memory for association rule generation, and the number of rules generated
ARM-AMO: An Efficient Association Rule Mining Algorithm Based on Animal Migration Optimization
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI linkAssociation rule mining (ARM) aims to find out association rules that satisfy predefined minimum support and confidence from a given database. However, in many cases ARM generates extremely large number of association rules, which are impossible for end users to comprehend or validate, thereby limiting the usefulness of data mining results. In this paper,
we propose a new mining algorithm based on Animal Migration Optimization (AMO), called
ARM-AMO, to reduce the number of association rules. It is based on the idea that rules which
are not of high support and unnecessary are deleted from the data. Firstly, Apriori algorithm is
applied to generate frequent itemsets and association rules. Then, AMO is used to reduce the
number of association rules with a new fitness function that incorporates frequent rules. It is
observed from the experiments that, in comparison with the other relevant techniques, ARM-AMO greatly reduces the computational time for frequent item set generation, memory for association rule generation, and the number of rules generated
An efficient association rule mining algorithm for classification
International audienceIn this paper, we propose a new Association Rule Mining algorithm for Classification (ARMC). Our algorithm extracts the set of rules, specific to each class, using a fuzzy approach to select the items and does not require the user to provide thresholds. ARMC is experimentaly evaluated and compared to state of the art classification algorithms, namely CBA, PART and RIPPER. Results of experiments on standard UCI benchmarks show that our algorithm outperforms the above mentionned approaches in terms of mean accuracy
A High Efficient Association Rule Mining Algorithm based on Intelligent Computation
AbstractâData mining is to use automated data analysis techniquesto uncover previously undetected relationships among data items. In datamining, association rule mining is a prevalent and well researched method for discovering useful relations between variables inlarge databases. In this paper, we investigate the principle of Apriori, direct hash and pruning and alsostudy the drawback of them. The first is constructing hash table withoutconfliction is theoretically optimal, but it needs consume a lot of memoryspace and space utilization is low. The second is that it does not have hashtree data structure leading to too long insert and search time. So we propose a new association rule mining algorithm based on differential evolutionarycomputation. Theexperiment results show that our proposed algorithm has better execution timeand accuracy, which can be used in electroniccommerce system. DOI : http://dx.doi.org/10.11591/telkomnika.v12i4.481