Advanced Adaptive Classifier Methods for Data Streams

Abstract

The exponential growth of the internet has resulted in an overwhelming influx of big data. However, traditional batch learning models face significant obstacles in effectively learning from these vast and constantly evolving data streams and generating up-to-date outcomes. To overcome these limitations, Stream Learning (SL) has emerged as a promising solution that enables continuous learning from evolving data streams and adapts to changes in input distributions. This thesis focuses on the classification task of SL, specifically investigating streaming gradient-boosted trees and Neural Network (NN)s. Firstly, we introduce Streaming Gradient Boosted Trees (SGBT), a novel gradient-boosted method designed explicitly for SL classification. Next, we propose Continuously Adaptive Neural Networks for Data Streams (CAND), an architecture-agnostic NN approach for evolving data stream classification. Both SGBT and CAND outperform current state-of-the-art bagging and random forest-based SL methods, demonstrating their superiority in handling evolving data stream classification tasks. Online Continual Learning (OCL) addresses the issue where NN learning from an evolving data stream forgets its past knowledge when confronted with a distribution shift. Online Domain Incremental Continual Learning (ODICL) is a specific variant of OCL where the input data distribution changes from one task to another. We propose two innovative methods: Online Domain Incremental Pool (ODIP) and Online Domain Incremental Networks (ODIN), for ODICL. The proposed methods leverage existing well-researched SL techniques described in Online Streaming Continual Learning (OSCL). ODIP and ODIN outperform current regularization methods without needing a replay buffer. ODIN achieves competitive results compared to replay-based methods. Both methods are ideal candidates for privacy-concerned ODICL scenarios, offering alternatives to regularization-based approaches. Overall, this thesis explores advancements in SL classification and ODICL, presenting novel techniques that surpass existing approaches in their respective domains. These contributions have significant implications for addressing the challenges posed by evolving data streams in the era of big data

    Similar works