4 research outputs found

    Skewed Evolving Data Streams Classification with Actionable Knowledge Extraction using Data Approximation and Adaptive Classification Framework

    Get PDF
    Skewed evolving data stream (SEDS) classification is a challenging research problem for online streaming data applications. The fundamental challenges in streaming data classification are class imbalance and concept drift. However, recently, either independently or together, the two topics have received enough attention; the data redundancy while performing stream data mining and classification remains unexplored. Moreover, the existing solutions for the classification of SEDSs have focused on solving concept drift and/or class imbalance problems using the sliding window mechanism, which leads to higher computational complexity and data redundancy problems. To end this, we propose a novel Adaptive Data Stream Classification (ADSC) framework for solving the concept drift, class imbalance, and data redundancy problems with higher computational and classification efficiency. Data approximation, adaptive clustering, classification, and actionable knowledge extraction are the major phases of ADSC. For the purpose of approximating unique items in the data stream with data pre-processing during the data approximation phase, we develop the Flajolet Martin (FM) algorithm. The periodically approximated tuples are grouped into distinct classes using an adaptive clustering algorithm to address the problem of concept drift and class imbalance. In the classification phase, the supervised classifiers are employed to classify the unknown incoming data streams into either of the classes discovered by the adaptive clustering algorithm. We then extract the actionable knowledge using classified skewed evolved data stream information for the end user decision-making process. The ADSC framework is empirically assessed utilizing two streaming datasets regarding classification and computing efficiency factors. The experimental results shows the better efficiency of the proposed ADSC framework as compared with existing classification methods

    Cleaning environmental sensing data streams based on individual sensor reliability

    No full text
    Environmental sensing is becoming a significant way for understanding and transforming the environment, given recent technology advances in the Internet of Things (IoT). Current environmental sensing projects typically deploy commodity sensors, which are known to be unreliable and prone to produce noisy and erroneous data. Unfortunately, the accuracy of current cleaning techniques based on mean or median prediction is unsatisfactory. In this paper, we propose a cleaning method based on incrementally adjusted individual sensor reliabilities, called influence mean cleaning (IMC). By incrementally adjusting sensor reliabilities, our approach can properly discover latent sensor reliability values in a data stream, and improve reliability-weighted prediction even in a sensor network with changing conditions. The experimental results based on both synthetic and real datasets show that our approach achieves higher accuracy than the mean and median-based approaches after some initial adjustment iterations.Yihong Zhang, Claudia Szabo and Quan Shen
    corecore