13 research outputs found

    A Hash Based Frequent Item set Mining using Rehashing

    Get PDF
    Data mining is the use of automated data analysis techniques to uncover previously undetected relationships among data items. Mining frequent item sets is one of the most important concepts of data mining. Frequent item set mining has been a highly concerned field of data mining for researcher for over two decades. It plays an essential role in many data mining tasks that try to find interesting itemsets from databases, such as association rules, correlations, sequences, classifiers and clusters . In this paper, we propose a new association rule mining algorithm called Rehashing Based Frequent Item set (RBFI) in which hashing technology is used to store the database in vertical data format. To avoid hash collision and secondary clustering problem in hashing, rehashing technique is utilized here. The advantages of this new hashing technique are easy to compute the hash function, fast access of data and efficiency. This algorithm provides facilities to avoid unnecessary scans to the database

    Frequent Itemset Generation using Double Hashing Technique

    Get PDF
    AbstractIn data mining, frequent itemsets plays an important role which is used to identify the correlations among the fields of database.In this paper, we propose a new association rule mining algorithm called Double Hashing Based Frequent Itemsets, (DHBFI) in which hashing technology is used to store the database in vertical data format. This double hashing technique is mainly preferred for avoiding the major issues of hash collision and secondary clustering problem in frequent itemset generation. Hence this proposed hashing technique makes the computation easier, faster and more efficient.Also this algorithm eliminates unnecessary redundant scans in the database and candidate itemset generation which leads to less space and time complexity

    New Spark solutions for distributed frequent itemset and association rule mining algorithms

    Get PDF
    Funding for open access publishing: Universidad de Gran- ada/CBUA. The research reported in this paper was partially sup- ported by the BIGDATAMED project, which has received funding from the Andalusian Government (Junta de Andalucı ́a) under grant agreement No P18-RT-1765, by Grants PID2021-123960OB-I00 and Grant TED2021-129402B-C21 funded by Ministerio de Ciencia e Innovacio ́n and, by ERDF A way of making Europe and by the European Union NextGenerationEU. In addition, this work has been partially supported by the Ministry of Universities through the EU- funded Margarita Salas programme NextGenerationEU. Funding for open access charge: Universidad de Granada/CBUAThe large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform which has been demonstrated to outperform existing distributive algorithmic implementations.Universidad de Granada/CBUAJunta de Andalucia P18-RT-1765Ministry of Science and Innovation, Spain (MICINN) Instituto de Salud Carlos III Spanish Government PID2021-123960OB-I00, TED2021-129402B-C21ERDF A way of making EuropeEuropean Union NextGenerationEUMinistry of Universities through the E

    Incremental association rule mining based on matrix compression for edge computing

    Get PDF
    A growing amount of data is being generated, communicated and processed at the edge nodes of cloud systems; this has the potential to improve response times and thus reduce communication bandwidth. We found that traditional static association rule mining cannot solve certain real-world problems with dynamically changing data. Incremental association rule mining algorithms have been studied. This paper combines the fast update pruning (FUP) algorithm with a compressed Boolean matrix and proposes a new incremental association rule mining algorithm, named the FUP algorithm based on a compression matrix (FBCM). This algorithm requires only a single scan of both the database and incremental databases, establishes two compressible Boolean matrices, and applies association rule mining to those matrices. The FBCM algorithm effectively improves the computational efficiency of incremental association rule mining and hence is suitable for knowledge discovery in the edge nodes of cloud systems

    ARM-AMO: An Efficient Association Rule Mining Algorithm Based on Animal Migration Optimization

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI linkAssociation rule mining (ARM) aims to find out association rules that satisfy predefined minimum support and confidence from a given database. However, in many cases ARM generates extremely large number of association rules, which are impossible for end users to comprehend or validate, thereby limiting the usefulness of data mining results. In this paper, we propose a new mining algorithm based on Animal Migration Optimization (AMO), called ARM-AMO, to reduce the number of association rules. It is based on the idea that rules which are not of high support and unnecessary are deleted from the data. Firstly, Apriori algorithm is applied to generate frequent itemsets and association rules. Then, AMO is used to reduce the number of association rules with a new fitness function that incorporates frequent rules. It is observed from the experiments that, in comparison with the other relevant techniques, ARM-AMO greatly reduces the computational time for frequent item set generation, memory for association rule generation, and the number of rules generated

    ARM-AMO: An Efficient Association Rule Mining Algorithm Based on Animal Migration Optimization

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI linkAssociation rule mining (ARM) aims to find out association rules that satisfy predefined minimum support and confidence from a given database. However, in many cases ARM generates extremely large number of association rules, which are impossible for end users to comprehend or validate, thereby limiting the usefulness of data mining results. In this paper, we propose a new mining algorithm based on Animal Migration Optimization (AMO), called ARM-AMO, to reduce the number of association rules. It is based on the idea that rules which are not of high support and unnecessary are deleted from the data. Firstly, Apriori algorithm is applied to generate frequent itemsets and association rules. Then, AMO is used to reduce the number of association rules with a new fitness function that incorporates frequent rules. It is observed from the experiments that, in comparison with the other relevant techniques, ARM-AMO greatly reduces the computational time for frequent item set generation, memory for association rule generation, and the number of rules generated

    An efficient association rule mining algorithm for classification

    No full text
    International audienceIn this paper, we propose a new Association Rule Mining algorithm for Classification (ARMC). Our algorithm extracts the set of rules, specific to each class, using a fuzzy approach to select the items and does not require the user to provide thresholds. ARMC is experimentaly evaluated and compared to state of the art classification algorithms, namely CBA, PART and RIPPER. Results of experiments on standard UCI benchmarks show that our algorithm outperforms the above mentionned approaches in terms of mean accuracy

    A High Efficient Association Rule Mining Algorithm based on Intelligent Computation

    No full text
    Abstract—Data mining is to use automated data analysis techniquesto uncover previously undetected relationships among data items. In datamining, association rule mining is a prevalent and well researched method for discovering useful relations between variables inlarge databases. In this paper, we investigate the principle of Apriori, direct hash and pruning and alsostudy the drawback of them. The first is constructing hash table withoutconfliction is theoretically optimal, but it needs consume a lot of memoryspace and space utilization is low. The second is that it does not have hashtree data structure leading to too long insert and search  time. So we propose a new association rule mining algorithm based on differential evolutionarycomputation. Theexperiment results show that our proposed algorithm has better execution timeand accuracy, which can be used in electroniccommerce system. DOI : http://dx.doi.org/10.11591/telkomnika.v12i4.481
    corecore