97,430 research outputs found

    Mining Time-Changing Data Streams

    Get PDF
    Streaming data have gained considerable attention in database and data mining communities because of the emergence of a class of applications, such as financial marketing, sensor networks, internet IP monitoring, and telecommunications that produce these data. Data streams have some unique characteristics that are not exhibited by traditional data: unbounded, fast-arriving, and time-changing. Traditional data mining techniques that make multiple passes over data or that ignore distribution changes are not applicable to dynamic data streams. Mining data streams has been an active research area to address requirements of the streaming applications. This thesis focuses on developing techniques for distribution change detection and mining time-changing data streams. Two techniques are proposed that can detect distribution changes in generic data streams. One approach for tackling one of the most popular stream mining tasks, frequent itemsets mining, is also presented in this thesis. All the proposed techniques are implemented and empirically studied. Experimental results show that the proposed techniques can achieve promising performance for detecting changes and mining dynamic data streams

    Handling Concept Drift for Predictions in Business Process Mining

    Get PDF
    Predictive services nowadays play an important role across all business sectors. However, deployed machine learning models are challenged by changing data streams over time which is described as concept drift. Prediction quality of models can be largely influenced by this phenomenon. Therefore, concept drift is usually handled by retraining of the model. However, current research lacks a recommendation which data should be selected for the retraining of the machine learning model. Therefore, we systematically analyze different data selection strategies in this work. Subsequently, we instantiate our findings on a use case in process mining which is strongly affected by concept drift. We can show that we can improve accuracy from 0.5400 to 0.7010 with concept drift handling. Furthermore, we depict the effects of the different data selection strategies

    Performance Evaluation of Anonymized Data Stream Classifiers

    Get PDF
    Data stream is a continuous and changing sequence of data that continuously arrive at a system to store or process. It is vital to find out useful information from large enormous amount of data streams generated from different applications viz. organization record, call center record, sensor data, network traffic, web searches etc. Privacy preserving data mining techniques allow generation of data for mining and preserve the private information of the individuals. In this paper, classification algorithms were applied on original data set as well as privacy preserved data set. Results were compared to evaluate the performance of various classification algorithms on the data streams that had been privacy preserved using anonymization techniques. The paper proposes an effective approach for classification of anonymized data streams. Intensive experiments were performed using appropriate data mining and anonymization tools. Experimental result shows that the proposed approach improves accuracy of classification and increases the utility, i.e. accuracy of classification while minimizing the mean absolute error. The proposed work presents the anonymization technique effective in terms of information loss and the classifiers efficient in terms of response time anddata usability

    Handling Concept Drift for Predictions in Business Process Mining

    Get PDF
    Predictive services nowadays play an important role across all business sectors. However, deployed machine learning models are challenged by changing data streams over time which is described as concept drift. Prediction quality of models can be largely influenced by this phenomenon. Therefore, concept drift is usually handled by retraining of the model. However, current research lacks a recommendation which data should be selected for the retraining of the machine learning model. Therefore, we systematically analyze different data selection strategies in this work. Subsequently, we instantiate our findings on a use case in process mining which is strongly affected by concept drift. We can show that we can improve accuracy from 0.5400 to 0.7010 with concept drift handling. Furthermore, we depict the effects of the different data selection strategies
    • …
    corecore