52,830 research outputs found

    Frequent Item Set Mining Using INC_MINE in Massive Online Analysis Frame Work

    Get PDF
    Frequent Pattern Mining is one of the major data mining techniques, which is exhaustively studied in the past decade. The technological advancements have resulted in huge data generation, having increased rate of data distribution. The generated data is called as a ‘data stream’. Data streams can be mined only by using sophisticated techniques. The paper aims at carrying out frequent pattern mining on data streams. Stream mining has great challenges due to high memory usage and computational costs. Massive online analysis frame work is a software environment used to perform frequent pattern mining using INC_MINE algorithm. The algorithm uses the method of closed frequent mining. The data sets used in the analysis are Electricity data set and Airline data set. The authors also generated their own data set, OUR-GENERATOR for the purpose of analysis and the results are found interesting. In the experiments five samples of instance sizes (10000, 15000, 25000, 35000, 50000) are used with varying minimum support and window sizes for determining frequent closed itemsets and semi frequent closed itemsets respectively. The present work establishes that association rule mining could be performed even in the case of data stream mining by INC_MINE algorithm by generating closed frequent itemsets which is first of its kind in the literature

    Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R

    Get PDF
    In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python). In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining

    A Deviant Load Shedding System for Data Stream Mining

    Get PDF
    AbstractLoad shedding is imperative for data stream processing systems in numerous functions as data streams are susceptible to sudden spikes in volume. The proposed system is an attempt to seek and resolve four major problems associated with data stream, which include load shedding and anti-shedding time, number of transactions pruned and selecting predicate; using efficient mining system. The frequent pattern discovered in data stream used in the model exploits the synergy between scheduling and load shedding. This paper also proposes various load shedding strategies which reduce and lighten the workload of the system ensuring an acceptable level of mining accuracy using various parameters like transaction, priority and attributes of data mining. A majority chunk of workload in mining algorithm lies in the innumerable item sets, which are counted and enumerated. The approach is based on the frequent pattern matching principle of stream mining which involves reducing the workload to maintain smaller item sets

    Mining Positional Data Streams

    Get PDF
    Abstract. We study frequent pattern mining from positional data streams. Existing approaches require discretised data to identify atomic events and are not applicable in our continuous setting. We propose an efficient trajectory-based preprocessing to identify similar movements and a distributed pattern mining algorithm to identify frequent trajectories. We empirically evaluate all parts of the processing pipeline

    Mining Frequent Item Sets Data Streams using "ÉclatAlgorithm"

    Get PDF
    Frequent pattern mining is the process of mining data in a set of items or some patterns from a largedatabase. The resulted frequent set data supports the minimum support threshold. A frequentpattern is a pattern that occurs frequently in a dataset. Association rule mining is defined as to findout association rules that satisfy the predefined minimum support and confidence from a given database. If an item set is said to be frequent, that item set supports the minimum support andconfidence. A Frequent item set should appear in all the transaction of that data base. Discoveringfrequent item sets play a very important role in mining association rules, sequence rules, web logmining and many other interesting patterns among complex data. Data stream is a real timecontinuous, ordered sequence of items. It is an uninterrupted flow of a long sequence of data. Somereal time examples of data stream data are sensor network data, telecommunication data,transactional data and scientific surveillances systems. These data produced trillions of updatesevery day. So it is very difficult to store the entire data. In that time some mining process is required.Data mining is the non-trivial process of identifying valid, original, potentially useful and ultimatelyunderstandable patterns in data. It is an extraction of the hidden predictive information from largedata base. There are lots of algorithms used to find out the frequent item set. In that Apriorialgorithm is the very first classical algorithm used to find the frequent item set. Apart from Apriori,lots of algorithms generated but they are similar to Apriori. They are based on prune and candidategeneration. It takes more memory and time to find out the frequent item set. In this paper, we havestudied about how the éclat algorithm is used in data streams to find out the frequent item sets.Éclat algorithm need not required candidate generation

    An efficient closed frequent itemset miner for the MOA stream mining system

    Get PDF
    Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Postprint (published version

    Efficient Pattern Mining for Wireless Sensor Networks Data

    Get PDF
    Wireless Sensor Networks generate a large amount of data in the form of streams. Mining association rules on the sensor data provides useful information for different applications. In this paper, a total from partial (TFP) tree based approach is used to generate the set of all association rules from data. Our experimental results show that TFP techniques perform better result in case of sparse dataset and significantly comparable as SP-tree approach for the dense dataset. Keywords: Association Rule Mining; Wireless Sensor Networks; Frequent Pattern

    Methods for frequent pattern mining in data streams within the MOA system

    Get PDF
    IncMine is a robust, efficient, practical, usable and extendable solution to perform Frequent Itemset mining over data streams. It is implementend under the Massive Online Analysis framework. It includes an analysis over its performances and its reaction to synthetic and real concept drift
    • …
    corecore