9,498 research outputs found

    Analyzing frequent patterns in data streams using a dynamic compact stream pattern algorithm

    Get PDF
    As a result of modern technology and the advancement in communication, a large amount of data streams are continually generated from various online applications, devices and sources. Mining frequent patterns from these streams of data is now an important research topic in the field of data mining and knowledge discovery. The traditional approach of mining data may not be appropriate for a large volume of data stream environment where the data volume is quite large and unbounded. They have the limitation of extracting recent change of knowledge in an adaptive mode from the data stream. Many algorithms and models have been developed to address the challenging task of mining data from an infinite influx of data generated from various points over the internet. The objective of this thesis is to introduce the concept of Dynamic Compact Pattern Stream tree (DCPS-tree) algorithm for mining recent data from the continuous data stream. Our DCPS-tree will dynamically achieves frequency descending prefix tree structure with only a single-pass over the data by applying tree restructuring techniques such as Branch sort method (BSM). This will cause any low frequency pattern to be maintained at the leaf nodes level and any high frequency components at a higher level. As a result of this, there will be a considerable mining time reduction on the datase

    An efficient closed frequent itemset miner for the MOA stream mining system

    Get PDF
    Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Postprint (published version

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed
    • …
    corecore