237 research outputs found

    A STUDY ON EFFICIENT DATA MINING APPROACH ON COMPRESSED TRANSACTION

    Get PDF
    Data mining can be viewed as a result of the natural evolution of information technology. The spread of computing has led to an explosion in the volume of data to be stored on hard disks and sent over the Internet. This growth has led to a need for data compression, that is, the ability to reduce the amount of storage or Internet bandwidth required to handle the data. This paper analysis the various data mining approaches which is used to compress the original database into a smaller one and perform the data mining process for compressed transaction such as M2TQT,PINCER-SEARCH algorithm, APRIOR

    A genetic algorithm coupled with tree-based pruning for mining closed association rules

    Get PDF
    Due to the voluminous amount of itemsets that are generated, the association rules extracted from these itemsets contain redundancy, and designing an effective approach to address this issue is of paramount importance. Although multiple algorithms were proposed in recent years for mining closed association rules most of them underperform in terms of run time or memory. Another issue that remains challenging is the nature of the dataset. While some of the existing algorithms perform well on dense datasets others perform well on sparse datasets. This paper aims to handle these drawbacks by using a genetic algorithm for mining closed association rules. Recent studies have shown that genetic algorithms perform better than conventional algorithms due to their bitwise operations of crossover and mutation. Bitwise operations are predominantly faster than conventional approaches and bits consume lesser memory thereby improving the overall performance of the algorithm. To address the redundancy in the mined association rules a tree-based pruning algorithm has been designed here. This works on the principle of minimal antecedent and maximal consequent. Experiments have shown that the proposed approach works well on both dense and sparse datasets while surpassing existing techniques with regard to run time and memory

    A Novel Nodesets-Based Frequent Itemset Mining Algorithm for Big Data using MapReduce

    Get PDF
    Due to the rapid growth of data from different sources in organizations, the traditional tools and techniques that cannot handle such huge data are known as big data which is in a scalable fashion. Similarly, many existing frequent itemset mining algorithms have good performance but scalability problems as they cannot exploit parallel processing power available locally or in cloud infrastructure. Since big data and cloud ecosystem overcomes the barriers or limitations in computing resources, it is a natural choice to use distributed programming paradigms such as Map Reduce. In this paper, we propose a novel algorithm known as A Nodesets-based Fast and Scalable Frequent Itemset Mining (FSFIM) to extract frequent itemsets from Big Data. Here, Pre-Order Coding (POC) tree is used to represent data and improve speed in processing. Nodeset is the underlying data structure that is efficient in discovering frequent itemsets. FSFIM is found to be faster and more scalable in mining frequent itemsets. When compared with its predecessors such as Node-lists and N-lists, the Nodesets save half of the memory as they need only either pre-order or post-order coding. Cloudera\u27s Distribution of Hadoop (CDH), a MapReduce framework, is used for empirical study. A prototype application is built to evaluate the performance of the FSFIM. Experimental results revealed that FSFIM outperforms existing algorithms such as Mahout PFP, Mlib PFP, and Big FIM. FSFIM is more scalable and found to be an ideal candidate for real-time applications that mine frequent itemsets from Big Data

    RESEARCH ISSUES CONCERNING ALGORITHMS USED FOR OPTIMIZING THE DATA MINING PROCESS

    Get PDF
    In this paper, we depict some of the most widely used data mining algorithms that have an overwhelming utility and influence in the research community. A data mining algorithm can be regarded as a tool that creates a data mining model. After analyzing a set of data, an algorithm searches for specific trends and patterns, then defines the parameters of the mining model based on the results of this analysis. The above defined parameters play a significant role in identifying and extracting actionable patterns and detailed statistics. The most important algorithms within this research refer to topics like clustering, classification, association analysis, statistical learning, link mining. In the following, after a brief description of each algorithm, we analyze its application potential and research issues concerning the optimization of the data mining process. After the presentation of the data mining algorithms, we will depict the most important data mining algorithms included in Microsoft and Oracle software products, useful suggestions and criteria in choosing the most recommended algorithm for solving a mentioned task, advantages offered by these software products.data mining optimization, data mining algorithms, software solutions

    An efficient closed frequent itemset miner for the MOA stream mining system

    Get PDF
    Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Postprint (published version

    Discovering High Utility Itemsets using Hybrid Approach

    Get PDF
    Mining of high utility itemsets especially from the big transactional databases is time consuming task. For mining the high utility itemsets from large transactional datasets multiple methods are available and have some consequential limitations. In case of performance these methods need to be scrutinized under low memory based systems for mining high utility itemsets from transactional datasets as well as to address further measures. The proposed algorithm combines the High Utility Pattern Mining and Incremental Frequent Pattern Mining. Two algorithms used are Apriori and existing Parallel UP Growth for mining high utility itemsets using transactional databases. The information about high utility itemsets is maintained in a data structure called UP tree. These algorithms are not only used to scans the incremental database but also collects newly generated frequent itemsets support count. It provides fast execution because it includes new itemsets in tree and removes rare itemset from a utility pattern tree structure that reduces cost and time. From various Experimental analysis and results, this hybrid approach with existing Apriori and UP-Growth is proposed with aim of improving the performance

    Review of Recommender Systems Algorithms Utilized in Social Networks based e-Learning Systems & Neutrosophic System

    Get PDF
    In this paper, we present a review of different recommender system algorithms that are utilized in social networks based e-Learning systems. Future research will include our proposed our e-Learning system that utilizes Recommender System and Social Network. Since the world is full of indeterminacy, the neutrosophics found their place into contemporary research. The fundamental concepts of neutrosophic set, introduced by Smarandache in [21, 22, 23] and Salama et al. in [24-66].The purpose of this paper is to utilize a neutrosophic set to analyze social networks data conducted through learning activities

    A Survey on Index Support for Item Set Mining

    Get PDF
    It is very difficult to handle the huge amount of information stored in modern databases. To manage with these databases association rule mining is currently used, which is a costly process that involves a significant amount of time and memory. Therefore, it is necessary to develop an approach to overcome these difficulties. A suitable data structures and algorithms must be developed to effectively perform the item set mining. An index includes all necessary characteristics potentially needed during the mining task; the extraction can be executed with the help of the index, without accessing the database. A database index is a data structure that enhances the speed of information retrieval operations on a database table at very low cost and increased storage space. The use index permits user interaction, in which the user can specify different attributes for item set extraction. Therefore, the extraction can be completed with the use index and without accessing the original database. Index also supports for reusing concept to mine item sets with the use of any support threshold. This paper also focuses on the survey of index support for item set mining which are proposed by various authors
    corecore