Search CORE

137 research outputs found

Combining Stream Mining and Neural Networks for Short Term Delay Prediction

Author: A Bifet
CW Tsai
N Marz
Y Qin
Publication venue
Publication date: 26/02/2018
Field of study

The systems monitoring the location of public transport vehicles rely on wireless transmission. The location readings from GPS-based devices are received with some latency caused by periodical data transmission and temporal problems preventing data transmission. This negatively affects identification of delayed vehicles. The primary objective of the work is to propose short term hybrid delay prediction method. The method relies on adaptive selection of Hoeffding trees, being stream classification technique and multilayer perceptrons. In this way, the hybrid method proposed in this study provides anytime predictions and eliminates the need to collect extensive training data before any predictions can be made. Moreover, the use of neural networks increases the accuracy of the predictions compared with the use of Hoeffding trees only

arXiv.org e-Print Archive

Crossref

Fairness-enhancing interventions in stream classification

Author: A Bifet
A Bifet
A Romei
D Brzeziński
F Kamiran
J Gama
L Sweeney
R Klinkenberg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/07/2019
Field of study

The wide spread usage of automated data-driven decision support systems has raised a lot of concerns regarding accountability and fairness of the employed models in the absence of human supervision. Existing fairness-aware approaches tackle fairness as a batch learning problem and aim at learning a fair model which can then be applied to future instances of the problem. In many applications, however, the data comes sequentially and its characteristics might evolve with time. In such a setting, it is counter-intuitive to "fix" a (fair) model over the data stream as changes in the data might incur changes in the underlying model therefore, affecting its fairness. In this work, we propose fairness-enhancing interventions that modify the input data so that the outcome of any stream classifier applied to that data will be fair. Experiments on real and synthetic data show that our approach achieves good predictive performance and low discrimination scores over the course of the stream.Comment: 15 pages, 7 figures. To appear in the proceedings of 30th International Conference on Database and Expert Systems Applications, Linz, Austria August 26 - 29, 201

arXiv.org e-Print Archive

Crossref

A cluster based prototype reduction for online classification

Author: A Bifet
I Zliobaite
J Demsar
J Gama
Publication venue: Springer
Publication date: 09/11/2018
Field of study

Crossref

University of Twente Research Information

Exploiting a Stimuli Encoding Scheme of Spiking Neural Networks for Stream Learning

Author: Bifet A.
Del Ser J.
Lobo J.L.
Oregi I.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Stream data processing has gained progressive momentum with the arriving of new stream applications and big data scenarios. One of the most promising techniques in stream learn- ing is the Spiking Neural Network, and some of them use an interesting population encod- ing scheme to transform the incoming stimuli into spikes. This study sheds lights on the key issue of this encoding scheme, the Gaussian receptive fields, and focuses on applying them as a pre-processing technique to any dataset in order to gain representativeness, and to boost the predictive performance of the stream learning methods. Experiments with synthetic and real data sets are presented, and lead to confirm that our approach can be applied successfully as a general pre-processing technique in many real cases

arXiv.org e-Print Archive

BCAM's Institutional Repository Data

A Survey on Concept Drift Adaptation

Author: Bifet A.
Bouchachia Abdelhamid
Gama J.
Pechenizkiy M.
Zliobaite Indre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art

Repository TU/e

Crossref

Pure OAI Repository

Bournemouth University Research Online

Efficient estimation of AUC in a sliding window

Author: A Bifet
C Ferri
D Brzezinski
DJ Hand
I Žliobaitė
J Gama
J Gama
J Gama
Remco R. Bouckaert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2019
Field of study

In many applications, monitoring area under the ROC curve (AUC) in a sliding window over a data stream is a natural way of detecting changes in the system. The drawback is that computing AUC in a sliding window is expensive, especially if the window size is large and the data flow is significant. In this paper we propose a scheme for maintaining an approximate AUC in a sliding window of length

k

. More specifically, we propose an algorithm that, given

\epsilon

, estimates AUC within

\epsilon / 2

, and can maintain this estimate in

O((\log k) / \epsilon)

time, per update, as the window slides. This provides a speed-up over the exact computation of AUC, which requires

O(k)

time, per update. The speed-up becomes more significant as the size of the window increases. Our estimate is based on grouping the data points together, and using these groups to calculate AUC. The grouping is designed carefully such that (

i

) the groups are small enough, so that the error stays small, (

ii

) the number of groups is small, so that enumerating them is not expensive, and (

iii

) the definition is flexible enough so that we can maintain the groups efficiently. Our experimental evaluation demonstrates that the average approximation error in practice is much smaller than the approximation guarantee

\epsilon / 2

, and that we can achieve significant speed-ups with only a modest sacrifice in accuracy

arXiv.org e-Print Archive

Crossref

On the performance of deep learning models for time series classification in streaming

Author: A Bifet
A Borovykh
A Cano
A Ghazikhani
FA Gers
HM Gomes
J Gama
JY Fernández-Rodríguez
Y Zhang
Publication venue
Publication date: 01/01/2020
Field of study

Processing data streams arriving at high speed requires the development of models that can provide fast and accurate predictions. Although deep neural networks are the state-of-the-art for many machine learning tasks, their performance in real-time data streaming scenarios is a research area that has not yet been fully addressed. Nevertheless, there have been recent efforts to adapt complex deep learning models for streaming tasks by reducing their processing rate. The design of the asynchronous dual-pipeline deep learning framework allows to predict over incoming instances and update the model simultaneously using two separate layers. The aim of this work is to assess the performance of different types of deep architectures for data streaming classification using this framework. We evaluate models such as multi-layer perceptrons, recurrent, convolutional and temporal convolutional neural networks over several time-series datasets that are simulated as streams. The obtained results indicate that convolutional architectures achieve a higher performance in terms of accuracy and efficiency.Comment: Paper submitted to the 15th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2020

arXiv.org e-Print Archive

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

An Efficient Scheme for Prototyping kNN in the Context of Real-Time Human Activity Recognition

Author: A Bifet
J Calvo-Zaragoza
KD Garcia
S Garcia
S Zhang
TM Cover
X Su
Publication venue: Springer
Publication date: 01/01/2019
Field of study

Crossref

University of Twente Research Information

Towards automated configuration of stream clustering algorithms

Author: A Bifet
A Bifet
F Hutter
JN Rijn van
JN Rijn van
M Carnein
M López-Ibáñez
P Kerschke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Clustering is an important technique in data analysis which can reveal hidden patterns and unknown relationships in the data. A common problem in clustering is the proper choice of parameter settings. To tackle this, automated algorithm configuration is available which can automatically find the best parameter settings. In practice, however, many of our today’s data sources are data streams due to the widespread deployment of sensors, the internet-of-things or (social) media. Stream clustering aims to tackle this challenge by identifying, tracking and updating clusters over time. Unfortunately, none of the existing approaches for automated algorithm configuration are directly applicable to the streaming scenario. In this paper, we explore the possibility of automated algorithm configuration for stream clustering algorithms using an ensemble of different configurations. In first experiments, we demonstrate that our approach is able to automatically find superior configurations and refine them over time

Crossref

Research Commons@Waikato

Hydrothermal alteration mapping of Siberian gold-ore fields based on satellite spectroscopy data

Author: Bifet A.
Holmes G.
Jansen Torsten
Kranen Philipp
Kremer Hardy
Pfahringer B.
Read J.
Seidl Thomas
Publication venue: 'IOP Publishing'
Publication date: 01/01/2011
Field of study

The mapping of the hydrothermal alterations in Urjahskoe and Fedorov-Kedrov gold-ore fields was conducted by applying channel relationship method (band ratio) based on ASTER spectral-zonal satellite image data. It was determined that the calculated mineral indices in ore-bearing structures are zonal. Outer ore-bearing structures revealed increased ferric mineral index values, while inner - high epidote- chlorite- calcite and muscovite- siderite mineral index values. Detected regularities could be used in identifying potential gold-ore bearing areas within identical fields based on remote sensing survey data

Electronic archive of Tomsk Polytechnic University

Crossref