97,430 research outputs found
Mining Time-Changing Data Streams
Streaming data have gained considerable attention in database and
data mining communities because of the emergence of a class of
applications, such as financial marketing, sensor networks, internet
IP monitoring, and telecommunications that produce these data. Data
streams have some unique characteristics that are not exhibited by
traditional data: unbounded, fast-arriving, and time-changing.
Traditional data mining techniques that make multiple passes over
data or that ignore distribution changes are not applicable to
dynamic data streams. Mining data streams has been an active
research area to address requirements of the streaming applications.
This thesis focuses on developing techniques for distribution change
detection and mining time-changing data streams. Two techniques are
proposed that can detect distribution changes in generic data
streams. One approach for tackling one of the most popular stream
mining tasks, frequent itemsets mining, is also presented in this
thesis. All the proposed techniques are implemented and empirically
studied. Experimental results show that the proposed techniques can
achieve promising performance for detecting changes and mining
dynamic data streams
Handling Concept Drift for Predictions in Business Process Mining
Predictive services nowadays play an important role across all business
sectors. However, deployed machine learning models are challenged by changing
data streams over time which is described as concept drift. Prediction quality
of models can be largely influenced by this phenomenon. Therefore, concept
drift is usually handled by retraining of the model. However, current research
lacks a recommendation which data should be selected for the retraining of the
machine learning model. Therefore, we systematically analyze different data
selection strategies in this work. Subsequently, we instantiate our findings on
a use case in process mining which is strongly affected by concept drift. We
can show that we can improve accuracy from 0.5400 to 0.7010 with concept drift
handling. Furthermore, we depict the effects of the different data selection
strategies
Performance Evaluation of Anonymized Data Stream Classifiers
Data stream is a continuous and changing sequence of data that continuously arrive at a system to store or process. It is vital to find out useful information from large enormous amount of data streams generated from different applications viz. organization record, call center record, sensor data, network traffic, web searches etc. Privacy preserving data mining techniques allow generation of data for mining and preserve the private information of the individuals. In this paper, classification algorithms were applied on original data set as well as privacy preserved data set. Results were compared to evaluate the performance of various classification algorithms on the data streams that had been privacy preserved using anonymization techniques. The paper proposes an effective approach for classification of anonymized data streams. Intensive experiments were performed using appropriate data mining and anonymization tools. Experimental result shows that the proposed approach improves accuracy of classification and increases the utility, i.e. accuracy of classification while minimizing the mean absolute error. The proposed work presents the anonymization technique effective in terms of information loss and the classifiers efficient in terms of response time anddata usability
Handling Concept Drift for Predictions in Business Process Mining
Predictive services nowadays play an important role across all business sectors. However, deployed machine learning models are challenged by changing data streams over time which is described as concept drift. Prediction quality of models can be largely influenced by this phenomenon. Therefore, concept drift is usually handled by retraining of the model. However, current research lacks a recommendation which data should be selected for the retraining of the machine learning model. Therefore, we systematically analyze different data selection strategies in this work. Subsequently, we instantiate our findings on a use case in process mining which is strongly affected by concept drift. We can show that we can improve accuracy from 0.5400 to 0.7010 with concept drift handling. Furthermore, we depict the effects of the different data selection strategies
- …