8,007 research outputs found
Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams
The last decade has seen a surge of interest in adaptive learning algorithms
for data stream classification, with applications ranging from predicting ozone
level peaks, learning stock market indicators, to detecting computer security
violations. In addition, a number of methods have been developed to detect
concept drifts in these streams. Consider a scenario where we have a number of
classifiers with diverse learning styles and different drift detectors.
Intuitively, the current 'best' (classifier, detector) pair is application
dependent and may change as a result of the stream evolution. Our research
builds on this observation. We introduce the \mbox{Tornado} framework that
implements a reservoir of diverse classifiers, together with a variety of drift
detection algorithms. In our framework, all (classifier, detector) pairs
proceed, in parallel, to construct models against the evolving data streams. At
any point in time, we select the pair which currently yields the best
performance. We further incorporate two novel stacking-based drift detection
methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The
experimental evaluation confirms that the current 'best' (classifier, detector)
pair is not only heavily dependent on the characteristics of the stream, but
also that this selection evolves as the stream flows. Further, our
\mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion
while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure
Stream Learning in Energy IoT Systems: A Case Study in Combined Cycle Power Plants
The prediction of electrical power produced in combined cycle power plants is a key challenge in the electrical power and energy systems field. This power production can vary depending on environmental variables, such as temperature, pressure, and humidity. Thus, the business problem is how to predict the power production as a function of these environmental conditions, in order to maximize the profit. The research community has solved this problem by applying Machine Learning techniques, and has managed to reduce the computational and time costs in comparison with the traditional thermodynamical analysis. Until now, this challenge has been tackled from a batch learning perspective, in which data is assumed to be at rest, and where models do not continuously integrate new information into already constructed models. We present an approach closer to the Big Data and Internet of Things paradigms, in which data are continuously arriving and where models learn incrementally, achieving significant enhancements in terms of data processing (time, memory and computational costs), and obtaining competitive performances. This work compares and examines the hourly electrical power prediction of several streaming regressors, and discusses about the best technique in terms of time processing and predictive performance to be applied on this streaming scenario.This work has been partially supported by the EU project iDev40. This project has received funding
from the ECSEL Joint Undertaking (JU) under grant agreement No 783163. The JU receives support from the
European Union’s Horizon 2020 research and innovation programme and Austria, Germany, Belgium, Italy,
Spain, Romania. It has also been supported by the Basque Government (Spain) through the project VIRTUAL
(KK-2018/00096), and by Ministerio de Economía y Competitividad of Spain (Grant Ref. TIN2017-85887-C2-2-P)
Handling Concept Drift for Predictions in Business Process Mining
Predictive services nowadays play an important role across all business
sectors. However, deployed machine learning models are challenged by changing
data streams over time which is described as concept drift. Prediction quality
of models can be largely influenced by this phenomenon. Therefore, concept
drift is usually handled by retraining of the model. However, current research
lacks a recommendation which data should be selected for the retraining of the
machine learning model. Therefore, we systematically analyze different data
selection strategies in this work. Subsequently, we instantiate our findings on
a use case in process mining which is strongly affected by concept drift. We
can show that we can improve accuracy from 0.5400 to 0.7010 with concept drift
handling. Furthermore, we depict the effects of the different data selection
strategies
Algorithm selection on data streams
We explore the possibilities of meta-learning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier performs best on the entire stream. This yields promising results and interesting patterns. In a second experiment, we build a meta-classifier that predicts, based on measurable data characteristics in a window of the data stream, the best classifier for the next window. The results show that this meta-algorithm is very competitive with state of the art ensembles, such as OzaBag, OzaBoost and Leveraged Bagging. The results of all experiments are made publicly available in an online experiment database, for the purpose of verifiability, reproducibility and generalizability
- …