1,232 research outputs found
Concept Drift Adaptation for Real-time Prediction
University of Technology Sydney. Faculty of Engineering and Information Technology.Concept drift refers to the phenomenon of distribution changes in a data stream. Using concept drift adaptation techniques to predict the target variable(s) of real-time data streams has gained the ever-increasing attention of researchers in recent years.
This research aims to develop a set of concept drift adaptation methods for predicting the target variable of real-time data streams. The literature review reveals two issues in the area of concept drift: i) how the concept drift problem limits the learning capability; ii) how to make adaptation in more realistic scenarios that data streams have uncertainties other than concept drift.
To address the issue i), this research discovers three root causes of limited learning capability when concept drift occurs. It is found that when concept drift occurs in a data stream, the prediction accuracy is decreased because 1) the training set contains more than one patterns so that the predictor cannot be well-learned; 2) a newly arrived data instance may present old patterns but an old instance presents the new pattern; and 3) few data instances are available when a new concept is identified at its early stage. Three concept drift adaptation methods are designed to address the three situations separately. Situation 1) is solved by developing a - (FUZZ-CARE) approach. FUZZ-CARE can learn how many patterns exist in the training set and the membership degree of each instance belonging to each pattern; To learn the predictor with the most relevant data rather than the newest arrived data, a - (SEGA) method to sequentially pick out the best segments in the training data to update the predictors. This addresses the situation 2). An (AFN) is designed to address the situation 3) through generating samples of the new concept with the previous data instances.
To address the issue ii), this research discusses the concept drift phenomenon under two scenarios that are more realistic. One is to solve the concept drift problem when data is noisy. A - (NoA) method is designed for handling concept drift when the data stream contains signal noise; the other is to solve the concept drift problem when data also contains temporal dependency. A theoretical study is conducted for the regression of data streams with concept drift and temporal dependency, and based on this study, a - (DAR) framework is established.
To conclude, this thesis not only provides a set of effective drift adaptation methods for real-time prediction, but also contributes to the development of concept drift area
A fuzzy kernel c-means clustering model for handling concept drift in regression
© 2017 IEEE. Concept drift, given the huge volume of high-speed data streams, requires traditional machine learning models to be self-adaptive. Techniques to handle drift are especially needed in regression cases for a wide range of applications in the real world. There is, however, a shortage of research on drift adaptation for regression cases in the literature. One of the main obstacles to further research is the resulting model complexity when regression methods and drift handling techniques are combined. This paper proposes a self-adaptive algorithm, based on a fuzzy kernel c-means clustering approach and a lazy learning algorithm, called FKLL, to handle drift in regression learning. Using FKLL, drift adaptation first updates the learning set using lazy learning, then fuzzy kernel c-means clustering is used to determine the most relevant learning set. Experiments show that the FKLL algorithm is better able to respond to drift as soon as the learning sets are updated, and is also suitable for dealing with reoccurring drift, when compared to the original lazy learning algorithm and other state-of-the-art regression methods
SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is
considered \de facto" standard in the framework of learning from imbalanced data. This
is due to its simplicity in the design of the procedure, as well as its robustness when applied
to di erent type of problems. Since its publication in 2002, SMOTE has proven
successful in a variety of applications from several di erent domains. SMOTE has also inspired
several approaches to counter the issue of class imbalance, and has also signi cantly
contributed to new supervised learning paradigms, including multilabel classi cation, incremental
learning, semi-supervised learning, multi-instance learning, among others. It is
standard benchmark for learning from imbalanced data. It is also featured in a number of
di erent software packages | from open source to commercial. In this paper, marking the
fteen year anniversary of SMOTE, we re
ect on the SMOTE journey, discuss the current
state of a airs with SMOTE, its applications, and also identify the next set of challenges
to extend SMOTE for Big Data problems.This work have been partially supported by the Spanish Ministry of Science and Technology
under projects TIN2014-57251-P, TIN2015-68454-R and TIN2017-89517-P; the Project
887 BigDaP-TOOLS - Ayudas Fundaci on BBVA a Equipos de Investigaci on Cient ca 2016;
and the National Science Foundation (NSF) Grant IIS-1447795
IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective
With the wide spread of sensors and smart devices in recent years, the data
generation speed of the Internet of Things (IoT) systems has increased
dramatically. In IoT systems, massive volumes of data must be processed,
transformed, and analyzed on a frequent basis to enable various IoT services
and functionalities. Machine Learning (ML) approaches have shown their capacity
for IoT data analytics. However, applying ML models to IoT data analytics tasks
still faces many difficulties and challenges, specifically, effective model
selection, design/tuning, and updating, which have brought massive demand for
experienced data scientists. Additionally, the dynamic nature of IoT data may
introduce concept drift issues, causing model performance degradation. To
reduce human efforts, Automated Machine Learning (AutoML) has become a popular
field that aims to automatically select, construct, tune, and update machine
learning models to achieve the best performance on specified tasks. In this
paper, we conduct a review of existing methods in the model selection, tuning,
and updating procedures in the area of AutoML in order to identify and
summarize the optimal solutions for every step of applying ML algorithms to IoT
data analytics. To justify our findings and help industrial users and
researchers better implement AutoML approaches, a case study of applying AutoML
to IoT anomaly detection problems is conducted in this work. Lastly, we discuss
and classify the challenges and research directions for this domain.Comment: Published in Engineering Applications of Artificial Intelligence
(Elsevier, IF:7.8); Code/An AutoML tutorial is available at Github link:
https://github.com/Western-OC2-Lab/AutoML-Implementation-for-Static-and-Dynamic-Data-Analytic
Data Stream Clustering: A Review
Number of connected devices is steadily increasing and these devices
continuously generate data streams. Real-time processing of data streams is
arousing interest despite many challenges. Clustering is one of the most
suitable methods for real-time data stream processing, because it can be
applied with less prior information about the data and it does not need labeled
instances. However, data stream clustering differs from traditional clustering
in many aspects and it has several challenging issues. Here, we provide
information regarding the concepts and common characteristics of data streams,
such as concept drift, data structures for data streams, time window models and
outlier detection. We comprehensively review recent data stream clustering
algorithms and analyze them in terms of the base clustering technique,
computational complexity and clustering accuracy. A comparison of these
algorithms is given along with still open problems. We indicate popular data
stream repositories and datasets, stream processing tools and platforms. Open
problems about data stream clustering are also discussed.Comment: Has been accepted for publication in Artificial Intelligence Revie
Data-based fault-tolerant model predictive controller an application to a complex dearomatization process
The tightening global competition during the last few decades has been the driving force for the optimisation of industrial plant operations through the use of advanced control methods, such as model predictive control (MPC). As the occurrence of faults in the process measurements and actuators has become more common due to the increase in the complexity of the control systems, the need for fault-tolerant control (FTC) to prevent the degradation of the controller performance, and therefore the better optimisation of the plant operations, has increased. Traditionally, the most actively studied fault detection and diagnosis (FDD) components of the FTC strategies have been based on model-based approaches. In the modern process industries, however, there is a need for the data-based FDD components due to the complexity and limited availability of mechanistic models. Recently, active FTC strategies using fault accommodation and controller reconfiguration have become popular due to the increased computation capacity, easier adaptability and lower overall implementation costs of the active FTC strategies.
The main focus of this thesis is on the development of an active data-based fault-tolerant MPC (FTMPC) for an industrial dearomatization process. Three different parallel-running FTC strategies are developed that utilise the data-based FDD methods and the fault accommodation- and controller reconfiguration-based FTC methods. The performances of three data-based FDD methods are first compared within an acknowledged testing environment. Based on the preliminary performance testing, the best FDD method is selected for the final FTMPC. Next, the performance of the FTMPC is validated with the simulation model of the industrial dearomatization process and finally, the profitability of the FTMPC is evaluated based on the results of the evaluation.
According to the testing, the FTMPC performs efficiently and detects and prevents the effects of the most common faults in the analyser, flow and temperature measurements, and the controller actuators. The reliability of the MPC is increased and the profitability of the dearomatization process is enhanced due to the lower off-spec production
- …