77,453 research outputs found
Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams
The last decade has seen a surge of interest in adaptive learning algorithms
for data stream classification, with applications ranging from predicting ozone
level peaks, learning stock market indicators, to detecting computer security
violations. In addition, a number of methods have been developed to detect
concept drifts in these streams. Consider a scenario where we have a number of
classifiers with diverse learning styles and different drift detectors.
Intuitively, the current 'best' (classifier, detector) pair is application
dependent and may change as a result of the stream evolution. Our research
builds on this observation. We introduce the \mbox{Tornado} framework that
implements a reservoir of diverse classifiers, together with a variety of drift
detection algorithms. In our framework, all (classifier, detector) pairs
proceed, in parallel, to construct models against the evolving data streams. At
any point in time, we select the pair which currently yields the best
performance. We further incorporate two novel stacking-based drift detection
methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The
experimental evaluation confirms that the current 'best' (classifier, detector)
pair is not only heavily dependent on the characteristics of the stream, but
also that this selection evolves as the stream flows. Further, our
\mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion
while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure
Autoencoder-based Anomaly Detection in Streaming Data with Incremental Learning and Concept Drift Adaptation
In our digital universe nowadays, enormous amount of data are produced in a
streaming manner in a variety of application areas. These data are often
unlabelled. In this case, identifying infrequent events, such as anomalies,
poses a great challenge. This problem becomes even more difficult in
non-stationary environments, which can cause deterioration of the predictive
performance of a model. To address the above challenges, the paper proposes an
autoencoder-based incremental learning method with drift detection
(strAEm++DD). Our proposed method strAEm++DD leverages on the advantages of
both incremental learning and drift detection. We conduct an experimental study
using real-world and synthetic datasets with severe or extreme class imbalance,
and provide an empirical analysis of strAEm++DD. We further conduct a
comparative study, showing that the proposed method significantly outperforms
existing baseline and advanced methods.Comment: anomaly detection, concept drift, incremental anomaly detection,
concept drift, incremental learning, autoencoders, data streams, class
imbalance, nonstationary environment
MORPH: Towards Automated Concept Drift Adaptation for Malware Detection
Concept drift is a significant challenge for malware detection, as the
performance of trained machine learning models degrades over time, rendering
them impractical. While prior research in malware concept drift adaptation has
primarily focused on active learning, which involves selecting representative
samples to update the model, self-training has emerged as a promising approach
to mitigate concept drift. Self-training involves retraining the model using
pseudo labels to adapt to shifting data distributions. In this research, we
propose MORPH -- an effective pseudo-label-based concept drift adaptation
method specifically designed for neural networks. Through extensive
experimental analysis of Android and Windows malware datasets, we demonstrate
the efficacy of our approach in mitigating the impact of concept drift. Our
method offers the advantage of reducing annotation efforts when combined with
active learning. Furthermore, our method significantly improves over existing
works in automated concept drift adaptation for malware detection
Benchmarking Change Detector Algorithms from Different Concept Drift Perspectives
The stream mining paradigm has become increasingly popular due to the vast number of algorithms and methodologies it provides to address the current challenges of Internet of Things (IoT) and modern machine learning systems. Change detection algorithms, which focus on identifying drifts in the data distribution during the operation of a machine learning solution, are a crucial aspect of this paradigm. However, selecting the best change detection method for different types of concept drift can be challenging. This work aimed to provide a benchmark for four drift detection algorithms (EDDM, DDM, HDDMW, and HDDMA) for abrupt, gradual, and incremental drift types. To shed light on the capacity and possible trade-offs involved in selecting a concept drift algorithm, we compare their detection capability, detection time, and detection delay. The experiments were carried out using synthetic datasets, where various attributes, such as stream size, the amount of drifts, and drift duration can be controlled and manipulated on our generator of synthetic stream. Our results show that HDDMW provides the best trade-off among all performance indicators, demonstrating superior consistency in detecting abrupt drifts, but has suboptimal time consumption and a limited ability to detect incremental drifts. However, it outperforms other algorithms in detection delay for both abrupt and gradual drifts with an efficient detection performance and detection time performance
- …