50 research outputs found

    A study on incremental mining of frequent patterns

    Get PDF
    Data generated from both the offline and online sources are incremental in nature. Changes in the underlying database occur due to the incremental data. Mining frequent patterns are costly in changing databases, since it requires scanning the database from the start. Thus, mining of growing databases has been a great concern. To mine the growing databases, a new Data Mining technique called Incremental Mining has emerged. The Incremental Mining uses previous mining result to get the desired knowledge by reducing mining costs in terms of time and space. This state of the art paper focuses on Incremental Mining approaches and identifies suitable approaches which are the need of real world problem.Keywords: Data Mining, Frequent Pattern, Incremental Mining, Frequent Pattern Minung, High Utility Mining, Constraint Mining

    Discovering Interesting Patterns and Associations in Data Streams

    Get PDF
    A data stream is a sequence of items that arrive in a timely order. Different from data in traditional static databases, data streams are continuous, unbounded, usually come with high speed, and have a data value distribution that often changes with time (Guha, 2001). As more applications such as web transactions, telephone records, and network flows generate a large number of data streams every day, efficient knowledge discovery of data streams is an active and growing research area in data mining with broad applications. Traditional data mining algorithms are developed to work on a complete static dataset and, thus, cannot be applied directly in data stream applications.One area of data mining research is to mine association relationship in a data set. Most of association mining techniques for data streams can be categorized into two types: those developed based on frequent patterns and those developed based on closed patterns. Due to the number of frequent patterns are often huge and redundant, non-informative patterns are contained in frequent patterns. An alternative way is to develop the association mining approaches for data streaming applications based on closed patterns, which generally represent a small subset of all frequent patterns, but provide complete and condensed information. In these researches, the closed pattern mining is the prerequisite condition for non-redundant and informative association mining.In this dissertation, a sliding window technique for dynamic mining of closed patterns in data streams is proposed, and an approach of mining non-redundant and informative associations based on the discovered closed patterns is developed. The closed pattern and relevant association mining techniques are selected research area in this dissertation. First, the closed patterns for a given collection of data are currently the most compact data knowledge that can provide complete support information for all data patterns.Compared with other techniques, the proposed closed pattern mining technique has potential to largely decrease the number of subsequent combinatorial calculations performed on the data patterns. Second, the memory requirement to store the closed patterns and relevant associations is generally lower than the corresponding frequent patterns and associations. In some data streaming applications, memory usage is an important measurement, because in these applications memory usage is the bottleneck for knowledge discovery. Third, the associations generated for data streams are the knowledge used to identify the relations within the data. The discovered relations can find their wide applications in many data streaming environments.Different from the closed pattern mining techniques on traditional databases, which require multiple scans of the entire database, the proposed technique determines the closed patterns with a single scan. It is an incremental mining process; as the sliding window advances, new data transactions enter and old data transactions exit the window. But instead of regenerating closed patterns from the entire window, the proposed technique updates the old set of closed patterns whenever a new transaction arrives and/or an old transaction leaves the sliding window to obtain the current set of closed patterns. This incremental feature allows the user to get the most recent updated closed patterns without rescanning the entire updated database, which saves not only the computation time, but more importantly, the I/O operating time to load and write data from database to memory. Third, the proposed sliding window technique can handle both the insertion and deletion operations independently, which allows the user to adjust the sliding window size in different application environments. Furthermore, the proposed interesting patterns and association mining framework can handle different users' requests at the same time at their specified support and confidence thresholds, and interested input and output patterns.The research includes both theoretical proofs of correctness for the proposed algorithms and simulation experiments to compare the proposed techniques with those existing in the literature using synthetic and real datasets. The utility of the proposed technique is applied to sensor network databases of a traffic management and an environmental monitoring site for missing data estimation purpose

    Incremental algorithm for association rule mining under dynamic threshold

    Get PDF
    © 2019 The Authors. Published by MDPI AG. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.3390/app9245398Data mining is essentially applied to discover new knowledge from a database through an iterative process. The mining process may be time consuming for massive datasets. A widely used method related to knowledge discovery domain refers to association rule mining (ARM) approach, despite its shortcomings in mining large databases. As such, several approaches have been prescribed to unravel knowledge. Most of the proposed algorithms addressed data incremental issues, especially when a hefty amount of data are added to the database after the latest mining process. Three basic manipulation operations performed in a database include add, delete, and update. Any method devised in light of data incremental issues is bound to embed these three operations. The changing threshold is a long-standing problem within the data mining field. Since decision making refers to an active process, the threshold is indeed changeable. Accordingly, the present study proposes an algorithm that resolves the issue of rescanning a database that had been mined previously and allows retrieval of knowledge that satisfies several thresholds without the need to learn the process from scratch. The proposed approach displayed high accuracy in experimentation, as well as reduction in processing time by almost two-thirds of the original mining execution time.This research was funded by University Malaya through a postgraduate research grant (PPP) grant number PG106-2015B.Published onlin

    Discovery and Effective Use of Frequent Item-set Mining and Association Rules in Datasets

    Get PDF
    The unprecedented rise in digitized data generation has led to the ever-expanding demand for sophisticated storage and analysis methods capable of handling vast amounts of complex data, much of which is stored within many databases. Owing to the large size of such databases, employment of sophisticated analysis methods, such as data mining and machine learning, becomes necessary to extract useful insights regarding a given system under study. Frequent itemset mining and association rules mining represent two key approaches to mining knowledge stored in databases. However, handling of large databases often leads to time-consuming calculations that necessitate large amounts of memory. In this regard, the development of methods capable of enabling faster, less laborious search or pattern discovery remains a central focus in the field of data mining. Incontestably, such methods could aid in faster processing and knowledge extraction, enabling new breakthroughs in how knowledge is acquired from data and applied in real-world applications. However, real-world applications are often hindered by limitations inherent to currently available algorithms. For instance, many itemset mining algorithms are known to first store a given database as a tree structure in memory. However, such algorithms fail to provide a tight upper bound on the number of nodes that will be generated during the tree building process accordingly, there are no upper bounds governing the amount of memory that is needed to generate such trees. As such, practical implementation of frequent itemset mining algorithms is often restricted by memory consumption. However, despite the importance of memory consumption in the applicability of itemset mining, this factor has not drawn adequate attention from the data mining community and remains as a key challenge in its application. In addition, the majority of algorithms widely used and studied to date are known to require multiple database scans, a factor which restricts their applicability for incremental mining applications. In this regard, the development of an algorithm capable of dynamically mining frequent patterns on-the-fly would open new pathways in data mining, enabling the application of itemset mining methods to new real-world applications, in addition to vastly improving current applications. In this thesis, different approaches are proposed in relation to the above-mentioned limitations currently hampering further progress in this significant area of data mining. First, an upper bound on the number of nodes of well-known tree structures in frequent itemset mining is presented. Second, aiming to overcome the memory consumption constraint, a memory-efficient method to store data processed by the frequent itemset mining algorithm is proposed, where instead of a tree, data is stored in a compact directed graph whose nodes represent items. Third, an algorithm is proposed to overcome costly databases scans in the form of a novel SPFP-tree (single pass frequent pattern tree) algorithm. Lastly, approaches that allow for frequent itemset and association rules to be practically and effectively used in real world applications are proposed. First, the quality and effectiveness of frequent itemset mining in solving a real world facility management problem is examined. Second, with aims of improving the quality of recommendations made to users, as well as to overcome the cold-start problem suffered by new users, a hybrid approach is herein proposed for the application of association rules into recommender systems

    Learning lost temporal fuzzy association rules

    Get PDF
    Fuzzy association rule mining discovers patterns in transactions, such as shopping baskets in a supermarket, or Web page accesses by a visitor to a Web site. Temporal patterns can be present in fuzzy association rules because the underlying process generating the data can be dynamic. However, existing solutions may not discover all interesting patterns because of a previously unrecognised problem that is revealed in this thesis. The contextual meaning of fuzzy association rules changes because of the dynamic feature of data. The static fuzzy representation and traditional search method are inadequate. The Genetic Iterative Temporal Fuzzy Association Rule Mining (GITFARM) framework solves the problem by utilising flexible fuzzy representations from a fuzzy rule-based system (FRBS). The combination of temporal, fuzzy and itemset space was simultaneously searched with a genetic algorithm (GA) to overcome the problem. The framework transforms the dataset to a graph for efficiently searching the dataset. A choice of model in fuzzy representation provides a trade-off in usage between an approximate and descriptive model. A method for verifying the solution to the hypothesised problem was presented. The proposed GA-based solution was compared with a traditional approach that uses an exhaustive search method. It was shown how the GA-based solution discovered rules that the traditional approach did not. This shows that simultaneously searching for rules and membership functions with a GA is a suitable solution for mining temporal fuzzy association rules. So, in practice, more knowledge can be discovered for making well-informed decisions that would otherwise be lost with a traditional approach.EPSRC DT
    corecore