137 research outputs found
Combining Stream Mining and Neural Networks for Short Term Delay Prediction
The systems monitoring the location of public transport vehicles rely on
wireless transmission. The location readings from GPS-based devices are
received with some latency caused by periodical data transmission and temporal
problems preventing data transmission. This negatively affects identification
of delayed vehicles. The primary objective of the work is to propose short term
hybrid delay prediction method. The method relies on adaptive selection of
Hoeffding trees, being stream classification technique and multilayer
perceptrons. In this way, the hybrid method proposed in this study provides
anytime predictions and eliminates the need to collect extensive training data
before any predictions can be made. Moreover, the use of neural networks
increases the accuracy of the predictions compared with the use of Hoeffding
trees only
Fairness-enhancing interventions in stream classification
The wide spread usage of automated data-driven decision support systems has
raised a lot of concerns regarding accountability and fairness of the employed
models in the absence of human supervision. Existing fairness-aware approaches
tackle fairness as a batch learning problem and aim at learning a fair model
which can then be applied to future instances of the problem. In many
applications, however, the data comes sequentially and its characteristics
might evolve with time. In such a setting, it is counter-intuitive to "fix" a
(fair) model over the data stream as changes in the data might incur changes in
the underlying model therefore, affecting its fairness. In this work, we
propose fairness-enhancing interventions that modify the input data so that the
outcome of any stream classifier applied to that data will be fair. Experiments
on real and synthetic data show that our approach achieves good predictive
performance and low discrimination scores over the course of the stream.Comment: 15 pages, 7 figures. To appear in the proceedings of 30th
International Conference on Database and Expert Systems Applications, Linz,
Austria August 26 - 29, 201
Exploiting a Stimuli Encoding Scheme of Spiking Neural Networks for Stream Learning
Stream data processing has gained progressive momentum with the arriving of new stream applications and big data scenarios. One of the most promising techniques in stream learn- ing is the Spiking Neural Network, and some of them use an interesting population encod- ing scheme to transform the incoming stimuli into spikes. This study sheds lights on the key issue of this encoding scheme, the Gaussian receptive fields, and focuses on applying them as a pre-processing technique to any dataset in order to gain representativeness, and to boost the predictive performance of the stream learning methods. Experiments with synthetic and real data sets are presented, and lead to confirm that our approach can be applied successfully as a general pre-processing technique in many real cases
A Survey on Concept Drift Adaptation
Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art
Efficient estimation of AUC in a sliding window
In many applications, monitoring area under the ROC curve (AUC) in a sliding
window over a data stream is a natural way of detecting changes in the system.
The drawback is that computing AUC in a sliding window is expensive, especially
if the window size is large and the data flow is significant.
In this paper we propose a scheme for maintaining an approximate AUC in a
sliding window of length . More specifically, we propose an algorithm that,
given , estimates AUC within , and can maintain this
estimate in time, per update, as the window slides.
This provides a speed-up over the exact computation of AUC, which requires
time, per update. The speed-up becomes more significant as the size of
the window increases. Our estimate is based on grouping the data points
together, and using these groups to calculate AUC. The grouping is designed
carefully such that () the groups are small enough, so that the error stays
small, () the number of groups is small, so that enumerating them is not
expensive, and () the definition is flexible enough so that we can
maintain the groups efficiently.
Our experimental evaluation demonstrates that the average approximation error
in practice is much smaller than the approximation guarantee ,
and that we can achieve significant speed-ups with only a modest sacrifice in
accuracy
On the performance of deep learning models for time series classification in streaming
Processing data streams arriving at high speed requires the development of
models that can provide fast and accurate predictions. Although deep neural
networks are the state-of-the-art for many machine learning tasks, their
performance in real-time data streaming scenarios is a research area that has
not yet been fully addressed. Nevertheless, there have been recent efforts to
adapt complex deep learning models for streaming tasks by reducing their
processing rate. The design of the asynchronous dual-pipeline deep learning
framework allows to predict over incoming instances and update the model
simultaneously using two separate layers. The aim of this work is to assess the
performance of different types of deep architectures for data streaming
classification using this framework. We evaluate models such as multi-layer
perceptrons, recurrent, convolutional and temporal convolutional neural
networks over several time-series datasets that are simulated as streams. The
obtained results indicate that convolutional architectures achieve a higher
performance in terms of accuracy and efficiency.Comment: Paper submitted to the 15th International Conference on Soft
Computing Models in Industrial and Environmental Applications (SOCO 2020
Towards automated configuration of stream clustering algorithms
Clustering is an important technique in data analysis which can reveal hidden patterns and unknown relationships in the data. A common problem in clustering is the proper choice of parameter settings. To tackle this, automated algorithm configuration is available which can automatically find the best parameter settings. In practice, however, many of our today’s data sources are data streams due to the widespread deployment of sensors, the internet-of-things or (social) media. Stream clustering aims to tackle this challenge by identifying, tracking and updating clusters over time. Unfortunately, none of the existing approaches for automated algorithm configuration are directly applicable to the streaming scenario. In this paper, we explore the possibility of automated algorithm configuration for stream clustering algorithms using an ensemble of different configurations. In first experiments, we demonstrate that our approach is able to automatically find superior configurations and refine them over time
Hydrothermal alteration mapping of Siberian gold-ore fields based on satellite spectroscopy data
The mapping of the hydrothermal alterations in Urjahskoe and Fedorov-Kedrov gold-ore fields was conducted by applying channel relationship method (band ratio) based on ASTER spectral-zonal satellite image data. It was determined that the calculated mineral indices in ore-bearing structures are zonal. Outer ore-bearing structures revealed increased ferric mineral index values, while inner - high epidote- chlorite- calcite and muscovite- siderite mineral index values. Detected regularities could be used in identifying potential gold-ore bearing areas within identical fields based on remote sensing survey data
- …