Search CORE

4,916 research outputs found

Scalable real-time classification of data streams with concept drift

Author: Aggarwal
Ben-Haim
Bramer
Bramer
Cortes
Dawid
Diego
Domingos
Domingos
Frederic Stahl
Gaber
Gaber
Gama
Gama
Gama
Han
João Bártolo Gomes
Le
Mark Tennant
Morales
Omer Rana
Quinlan
Ranjan
Ross
Sidhu
Stisen
Tennant
Tennant
Publication venue: 'Elsevier BV'
Publication date: 01/10/2017
Field of study

Inducing adaptive predictive models in real-time from high throughput data streams is one of the most challenging areas of Big Data Analytics. The fact that data streams may contain concept drifts (changes of the pattern encoded in the stream over time) and are unbounded, imposes unique challenges in comparison with predictive data mining from batch data. Several real-time predictive data stream algorithms exist, however, most approaches are not naturally parallel and thus limited in their scalability. This paper highlights the Micro-Cluster Nearest Neighbour (MC-NN) data stream classifier. MC-NN is based on statistical summaries of the data stream and a nearest neighbour approach, which makes MC-NN naturally parallel. In its serial version MC-NN is able to handle data streams, the data does not need to reside in memory and is processed incrementally. MC-NN is also able to adapt to concept drifts. This paper provides an empirical study on the serial algorithm’s speed, adaptivity and accuracy. Furthermore, this paper discusses the new parallel implementation of MC-NN, its parallel properties and provides an empirical scalability study

Central Archive at the University of Reading

Crossref

Online Research @ Cardiff

Fast adaptive real-time classification for data streams with concept drift

Author: Gomes JoãoBártolo
Stahl Frederic
Tennant Mark
Publication venue: Springer International Publishing
Publication date: 01/01/2015
Field of study

An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed

Central Archive at the University of Reading

LUNAR: Cellular automata for drifting data streams

Author: Del Ser J.
Herrera F.
Lobo J.L.
Publication venue: 'Elsevier BV'
Publication date: 06/02/2020
Field of study

With the advent of fast data streams, real-time machine learning has become a challenging task, demanding many processing resources. In addition, they can be affected by the concept drift effect, by which learning methods have to detect changes in the data distribution and adapt to these evolving conditions. Several emerging paradigms such as the so-called Smart Dust, Utility Fog, or Swarm Robotics are in need for efficient and scalable solutions in real-time scenarios, and where usually computing resources are constrained. Cellular automata, as low-bias and robust-to-noise pattern recognition methods with competitive classification performance, meet the requirements imposed by the aforementioned paradigms mainly due to their simplicity and parallel nature. In this work we propose LUNAR, a streamified version of cellular automata devised to successfully meet the aforementioned requirements. LUNAR is able to act as a real incremental learner while adapting to drifting conditions. Furthermore, LUNAR is highly interpretable, as its cellular structure represents directly the mapping between the feature space and the labels to be predicted. Extensive simulations with synthetic and real data will provide evidence of its competitive behavior in terms of classification performance when compared to long-established and successful online learning methods

arXiv.org e-Print Archive

BCAM's Institutional Repository Data

Next challenges for adaptive learning systems

Author: Bifet A.
Gaber M.
Gabrys B.
Gama J.
Minku L.
Musial K.
Zliobaite I.
Publication venue
Publication date: 01/01/2012
Field of study

University of Birmingham Research Portal

Portsmouth University Research Portal (Pure)

Evolving Ensemble Fuzzy Classifier

Author: Lughofer Edwin
Pedrycz Witold
Pratama Mahardhika
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single model counterpart and features a reconfigurable structure, which is well suited to the given context. While various extensions of ensemble learning for mining non-stationary data streams can be found in the literature, most of them are crafted under a static base classifier and revisits preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because it involves a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in this paper. pENsemble differs from existing architectures in the fact that it is built upon an evolving classifier from data streams, termed Parsimonious Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism, which estimates a localized generalization error of a base classifier. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. pENsemble adopts a dynamic ensemble structure to output a final classification decision where it features a novel drift detection scenario to grow the ensemble structure. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Finding and tracking multi-density clusters in an online dynamic data stream

Author: Fahy Conor
Yang Shengxiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/05/2019
Field of study

The file attached to this record is the author's final peer reviewed version.Change is one of the biggest challenges in dynamic stream mining. From a data-mining perspective, adapting and tracking change is desirable in order to understand how and why change has occurred. Clustering, a form of unsupervised learning, can be used to identify the underlying patterns in a stream. Density-based clustering identifies clusters as areas of high density separated by areas of low density. This paper proposes a Multi-Density Stream Clustering (MDSC) algorithm to address these two problems; the multi-density problem and the problem of discovering and tracking changes in a dynamic stream. MDSC consists of two on-line components; discovered, labelled clusters and an outlier buffer. Incoming points are assigned to a live cluster or passed to the outlier buffer. New clusters are discovered in the buffer using an ant-inspired swarm intelligence approach. The newly discovered cluster is uniquely labelled and added to the set of live clusters. Processed data is subject to an ageing function and will disappear when it is no longer relevant. MDSC is shown to perform favourably to state-of-the-art peer stream-clustering algorithms on a range of real and synthetic data-streams. Experimental results suggest that MDSC can discover qualitatively useful patterns while being scalable and robust to noise

De Montfort University Open Research Archive