Search CORE

15 research outputs found

Advances on Concept Drift Detection in Regression Tasks using Social Networks Theory

Author: Barddal Jean Paul
Enembreck Fabrício
Gomes Heitor Murilo
Publication venue: 'IGI Global'
Publication date: 19/04/2023
Field of study

Mining data streams is one of the main studies in machine learning area due to its application in many knowledge areas. One of the major challenges on mining data streams is concept drift, which requires the learner to discard the current concept and adapt to a new one. Ensemble-based drift detection algorithms have been used successfully to the classification task but usually maintain a fixed size ensemble of learners running the risk of needlessly spending processing time and memory. In this paper we present improvements to the Scale-free Network Regressor (SFNR), a dynamic ensemble-based method for regression that employs social networks theory. In order to detect concept drifts SFNR uses the Adaptive Window (ADWIN) algorithm. Results show improvements in accuracy, especially in concept drift situations and better performance compared to other state-of-the-art algorithms in both real and synthetic data

arXiv.org e-Print Archive

A survey on feature drift adaptation: Definition, benchmark, challenges and future directions

Author: Barddal Jean Paul
Enembreck Fabrício
Gomes Heitor Murilo
Pfahringer Bernhard
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Data stream mining is a fast growing research topic due to the ubiquity of data in several real-world problems. Given their ephemeral nature, data stream sources are expected to undergo changes in data distribution, a phenomenon called concept drift. This paper focuses on one specific type of drift that has not yet been thoroughly studied, namely feature drift. Feature drift occurs whenever a subset of features becomes, or ceases to be, relevant to the learning task; thus, learners must detect and adapt to these changes accordingly. We survey existing work on feature drift adaptation with both explicit and implicit approaches. Additionally, we benchmark several algorithms and a naive feature drift detection approach using synthetic and real-world datasets. The results from our experiments indicate the need for future research in this area as even naive approaches produced gains in accuracy while reducing resources usage. Finally, we state current research topics, challenges and future directions for feature drift adaptation

Crossref

Research Commons@Waikato

Evaluating k-NN in the Classification of Data Streams with Concept Drift

Author: Barddal Jean Paul
de Barros Roberto Souto Maior
Santos Silas Garrido Teixeira de Carvalho
Publication venue
Publication date: 05/10/2022
Field of study

Data streams are often defined as large amounts of data flowing continuously at high speed. Moreover, these data are likely subject to changes in data distribution, known as concept drift. Given all the reasons mentioned above, learning from streams is often online and under restrictions of memory consumption and run-time. Although many classification algorithms exist, most of the works published in the area use Naive Bayes (NB) and Hoeffding Trees (HT) as base learners in their experiments. This article proposes an in-depth evaluation of k-Nearest Neighbors (k-NN) as a candidate for classifying data streams subjected to concept drift. It also analyses the complexity in time and the two main parameters of k-NN, i.e., the number of nearest neighbors used for predictions (k), and window size (w). We compare different parameter values for k-NN and contrast it to NB and HT both with and without a drift detector (RDDM) in many datasets. We formulated and answered 10 research questions which led to the conclusion that k-NN is a worthy candidate for data stream classification, especially when the run-time constraint is not too restrictive.Comment: 25 pages, 10 tables, 7 figures + 30 pages appendi

arXiv.org e-Print Archive

Adaptive random forests for evolving data stream classification

Author: Abdessalem Talel
Barddal Jean Paul
Bifet Albert
Enembreck Fabrício
Gomes Heitor Murilo
Holmes Geoffrey
Pfahringer Bernhard
Read Jesse
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Random forests is currently one of the most used machine learning algorithms in the non-streaming (batch) setting. This preference is attributable to its high learning performance and low demands with respect to input preparation and hyper-parameter tuning. However, in the challenging context of evolving data streams, there is no random forests algorithm that can be considered state-of-the-art in comparison to bagging and boosting based algorithms. In this work, we present the adaptive random forest (ARF) algorithm for classification of evolving data streams. In contrast to previous attempts of replicating random forests for data stream learning, ARF includes an effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimizations for different data sets. We present experiments with a parallel implementation of ARF which has no degradation in terms of classification performance in comparison to a serial implementation, since trees and adaptive operators are independent from one another. Finally, we compare ARF with state-of-the-art algorithms in a traditional test-then-train evaluation and a novel delayed labelling evaluation, and show that ARF is accurate and uses a feasible amount of resources

Crossref

Research Commons@Waikato

HAL-Polytechnique

Deep Single Models vs. Ensembles: Insights for a Fast Deployment of Parking Monitoring Systems

Author: Barddal Jean Paul
de Almeida Paulo Ricardo Lisboa
Hochuli Andre Gustavo
Mendes Leonardo Matheus
Palhano Gillian Cezar
Publication venue
Publication date: 28/09/2023
Field of study

Searching for available parking spots in high-density urban centers is a stressful task for drivers that can be mitigated by systems that know in advance the nearest parking space available. To this end, image-based systems offer cost advantages over other sensor-based alternatives (e.g., ultrasonic sensors), requiring less physical infrastructure for installation and maintenance. Despite recent deep learning advances, deploying intelligent parking monitoring is still a challenge since most approaches involve collecting and labeling large amounts of data, which is laborious and time-consuming. Our study aims to uncover the challenges in creating a global framework, trained using publicly available labeled parking lot images, that performs accurately across diverse scenarios, enabling the parking space monitoring as a ready-to-use system to deploy in a new environment. Through exhaustive experiments involving different datasets and deep learning architectures, including fusion strategies and ensemble methods, we found that models trained on diverse datasets can achieve 95\% accuracy without the burden of data annotation and model training on the target parking lotComment: An improved version of this manuscript was submitted to IEEE ICMLA 2023 (Dec/23

arXiv.org e-Print Archive

Boosting decision stumps for dynamic feature selection on data streams

Author: Barddal Jean Paul
Bifet Albert
Enembreck Fabrício
Gomes Heitor Murilo
Pfahringer Bernhard
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Feature selection targets the identification of which features of a dataset are relevant to the learning task. It is also widely known and used to improve computation times, reduce computation requirements, and to decrease the impact of the curse of dimensionality and enhancing the generalization rates of classifiers. In data streams, classifiers shall benefit from all the items above, but more importantly, from the fact that the relevant subset of features may drift over time. In this paper, we propose a novel dynamic feature selection method for data streams called Adaptive Boosting for Feature Selection (ABFS). ABFS chains decision stumps and drift detectors, and as a result, identifies which features are relevant to the learning task as the stream progresses with reasonable success. In addition to our proposed algorithm, we bring feature selection-specific metrics from batch learning to streaming scenarios. Next, we evaluate ABFS according to these metrics in both synthetic and real-world scenarios. As a result, ABFS improves the classification rates of different types of learners and eventually enhances computational resources usage

Research Commons@Waikato

Random forest kernel for high-dimension low sample size classification

Author: Barddal Jean Paul
Bernard Simon
Cavalheiro Lucca Portes
Heutte Laurent
Publication venue: Springer Verlag (Germany)
Publication date: 17/11/2023
Field of study

High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the Random Forest Dissimilarity (RFD), that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems

HAL - Normandie Université

arXiv.org e-Print Archive

Merit-guided dynamic feature selection filter for data streams

Author: Barddal Jean Paul
Bifet Albert
Enembreck Fabrício
Gomes Heitor Murilo
Pfahringer Bernhard
Publication venue: HAL CCSD
Publication date: 01/01/2019
Field of study

HAL Descartes

Machine learning for streaming data

Author: Barddal Jean Paul
Bifet Albert
Gama João
Gomes Heitor Murilo
Read Jesse
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/11/2019
Field of study

International audienc

HAL-Polytechnique