Search CORE

62 research outputs found

Adaptive random forests for evolving data stream classification

Author: Abdessalem Talel
Barddal Jean Paul
Bifet Albert
Enembreck Fabrício
Gomes Heitor Murilo
Holmes Geoffrey
Pfahringer Bernhard
Read Jesse
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Random forests is currently one of the most used machine learning algorithms in the non-streaming (batch) setting. This preference is attributable to its high learning performance and low demands with respect to input preparation and hyper-parameter tuning. However, in the challenging context of evolving data streams, there is no random forests algorithm that can be considered state-of-the-art in comparison to bagging and boosting based algorithms. In this work, we present the adaptive random forest (ARF) algorithm for classification of evolving data streams. In contrast to previous attempts of replicating random forests for data stream learning, ARF includes an effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimizations for different data sets. We present experiments with a parallel implementation of ARF which has no degradation in terms of classification performance in comparison to a serial implementation, since trees and adaptive operators are independent from one another. Finally, we compare ARF with state-of-the-art algorithms in a traditional test-then-train evaluation and a novel delayed labelling evaluation, and show that ARF is accurate and uses a feasible amount of resources

Crossref

Research Commons@Waikato

HAL-Polytechnique

Scikit-Multiflow: A Multi-output Streaming Framework

Author: Abdessalem Talel
Bifet Albert
Montiel Jacob
Read Jesse
Publication venue: 'Test accounts'
Publication date: 12/07/2018
Field of study

Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open source frameworks including scikit-learn, MOA and MEKA. Development follows the FOSS principles and quality is enforced by complying with PEP8 guidelines and using continuous integration and automatic testing. The source code is publicly available at https://github.com/scikit-multiflow/scikit-multiflow.Comment: 5 pages, Open Source Softwar

arXiv.org e-Print Archive

Online GentleAdaBoost -- Technical Report

Author: Siu Chapman
Publication venue
Publication date: 09/09/2023
Field of study

We study the online variant of GentleAdaboost, where we combine a weak learner to a strong learner in an online fashion. We provide an approach to extend the batch approach to an online approach with theoretical justifications through application of line search. Finally we compare our online boosting approach with other online approaches across a variety of benchmark datasets

arXiv.org e-Print Archive

Text classification supervised algorithms with term frequency inverse document frequency and global vectors for word representation: a comparative study

Author: Bahassine Said
Benabbes Khalid
Hamou Aadi Fatima Zahrae Ait
Housni Khalid
Labd Zakia
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2024
Field of study

Over the course of the previous two decades, there has been a rise in the quantity of text documents stored digitally. The ability to organize and categorize those documents in an automated mechanism, is known as text categorization which is used to classify them into a set of predefined categories so they may be preserved and sorted more efficiently. Identifying appropriate structures, architectures, and methods for text classification presents a challenge for researchers. This is due to the significant impact this concept has on content management, contextual search, opinion mining, product review analysis, spam filtering, and text sentiment mining. This study analyzes the generic categorization strategy and examines supervised machine learning approaches and their ability to comprehend complex models and nonlinear data interactions. Among these methods are k-nearest neighbors (KNN), support vector machine (SVM), and ensemble learning algorithms employing various evaluation techniques. Thereafter, an evaluation is conducted on the constraints of every technique and how they can be applied to real-life situations

Institute of Advanced Engineering and Science

Rebalancing Learning on Evolving Data Streams

Author: Bernardo Alessio
Bifet Albert
Della Valle Emanuele
Publication venue
Publication date: 17/11/2019
Field of study

Nowadays, every device connected to the Internet generates an ever-growing stream of data (formally, unbounded). Machine Learning on unbounded data streams is a grand challenge due to its resource constraints. In fact, standard machine learning techniques are not able to deal with data whose statistics is subject to gradual or sudden changes without any warning. Massive Online Analysis (MOA) is the collective name, as well as a software library, for new learners that are able to manage data streams. In this paper, we present a research study on streaming rebalancing. Indeed, data streams can be imbalanced as static data, but there is not a method to rebalance them incrementally, one element at a time. For this reason we propose a new streaming approach able to rebalance data streams online. Our new methodology is evaluated against some synthetically generated datasets using prequential evaluation in order to demonstrate that it outperforms the existing approaches

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

On the performance of deep learning models for time series classification in streaming

Author: A Bifet
A Borovykh
A Cano
A Ghazikhani
FA Gers
HM Gomes
J Gama
JY Fernández-Rodríguez
Y Zhang
Publication venue
Publication date: 01/01/2020
Field of study

Processing data streams arriving at high speed requires the development of models that can provide fast and accurate predictions. Although deep neural networks are the state-of-the-art for many machine learning tasks, their performance in real-time data streaming scenarios is a research area that has not yet been fully addressed. Nevertheless, there have been recent efforts to adapt complex deep learning models for streaming tasks by reducing their processing rate. The design of the asynchronous dual-pipeline deep learning framework allows to predict over incoming instances and update the model simultaneously using two separate layers. The aim of this work is to assess the performance of different types of deep architectures for data streaming classification using this framework. We evaluate models such as multi-layer perceptrons, recurrent, convolutional and temporal convolutional neural networks over several time-series datasets that are simulated as streams. The obtained results indicate that convolutional architectures achieve a higher performance in terms of accuracy and efficiency.Comment: Paper submitted to the 15th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2020

arXiv.org e-Print Archive

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Data streams classification using deep learning under different speeds and drifts

Author: Carranza García Manuel
Gutiérrez Avilés David
Lara Benítez Pedro
Riquelme Santos José Cristóbal
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2022
Field of study

Processing data streams arriving at high speed requires the development of models that can provide fast and accurate predictions. Although deep neural networks are the state-of-the-art for many machine learning tasks, their performance in real-time data streaming scenarios is a research area that has not yet been fully addressed. Nevertheless, much effort has been put into the adaption of complex deep learning (DL) models to streaming tasks by reducing the processing time. The design of the asynchronous dual-pipeline DL framework allows making predictions of incoming instances and updating the model simultaneously, using two separate layers. The aim of this work is to assess the performance of different types of DL architectures for data streaming classification using this framework. We evaluate models such as multi-layer perceptrons, recurrent, convolutional and temporal convolutional neural networks over several time series datasets that are simulated as streams at different speeds. In addition, we evaluate how the different architectures react to concept drifts typically found in evolving data streams. The obtained results indicate that convolutional architectures achieve a higher performance in terms of accuracy and efficiency, but are also the most sensitive to concept drifts.Ministerio de Ciencia, Innovación y Universidades PID2020-117954RB-C22Junta de Andalucía US-1263341Junta de Andalucía P18-RT-277

idUS. Depósito de Investigación Universidad de Sevilla