409 research outputs found
An ensemble based on neural networks with random weights for online data stream regression
Most information sources in the current technological world are generating data sequentially and
rapidly, in the form of data streams. The evolving nature of processes may often cause changes in
data distribution, also known as concept drift, which is difficult to detect and causes loss of
accuracy in supervised learning algorithms. As a consequence, online machine learning algorithms
that are able to update actively according to possible changes in the data distribution are required.
Although many strategies have been developed to tackle this problem, most of them are designed
for classification problems. Therefore, in the domain of regression problems, there is a need for the
development of accurate algorithms with dynamic updating mechanisms that can operate in a
computational time compatible with today’s demanding market. In this article, the authors propose
a new bagging ensemble approach based on Neural Network with Random Weights for online data
stream regression. The proposed method improves the data prediction accuracy as well as
minimises the required computational time compared to a recent algorithm for online data stream
regression from literature. The experiments are carried out using four synthetic datasets to evaluate
the algorithm's response to concept drift, along with four benchmark datasets from different
industries. The results indicate improvement in data prediction accuracy, effectiveness in handling
concept drift and much faster updating times compared to the existing available approach.
Additionally, the use of Design of Experiments as an effective tool for hyperparameter tuning is
demonstrated
Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification
We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets). The aim is to find General-Purpose (GP) heterogeneous ensembles (requiring little to no parameter tuning) that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM), even when both kernel selection and SVM parameters are carefully tuned for each dataset
Neural Network Ensembles and Their Application to Traffic Flow Prediction in Telecommunications Networks
This series is dedicated to reporting our recent research in spatial science in general and economic
geography & geoinformatics in particular. It contains scientific studies focusing on spatial phenomena,
utilizing theoretical frameworks, analytical methods and empirical procedures specifically designed for
spatial analysis. The aim is to present the research at the Department to an informed readership in
universities, research organizations and policy-making institutions throughout the world. The type of
materials considered for publication in the series includes interim reports presenting work in progress
and papers which have been submitted for publication elsewhere.Series: Discussion Papers of the Institute for Economic Geography and GIScienc
Ensemble based on randomised neural networks for online data stream regression in presence of concept drift
The big data paradigm has posed new challenges for the Machine Learning algorithms, such as analysing continuous flows of data, in the form of data streams, and dealing with the evolving nature of the data, which cause a phenomenon often referred to in the literature as concept drift. Concept drift is caused by inconsistencies between the optimal hypotheses in two subsequent chunks of data, whereby the concept underlying a given process evolves over time, which can happen due to several factors including change in consumer preference, economic dynamics, or environmental conditions. This thesis explores the problem of data stream regression with the presence of concept drift. This problem requires computationally efficient algorithms that are able to adapt to the various types of drift that may affect the data. The development of effective algorithms for data streams with concept drift requires several steps that are discussed in this research. The first one is related to the datasets required to assess the algorithms. In general, it is not possible to determine the occurrence of concept drift on real-world datasets; therefore, synthetic datasets where the various types of concept drift can be simulated are required. The second issue is related to the choice of the algorithm. The ensemble algorithms show many advantages to deal with concept drifting data streams, which include flexibility, computational efficiency and high accuracy. For the design of an effective ensemble, this research analyses the use of randomised Neural Networks as base models, along with their optimisation. The optimisation of the randomised Neural Networks involves design and tuning hyperparameters which may substantially affect its performance. The optimisation of the base models is an important aspect to build highly accurate and computationally efficient ensembles. To cope with the concept drift, the existing methods either require setting fixed updating points, which may result in unnecessary computations or slow reaction to concept drift, or rely on drifting detection mechanism, which may be ineffective due to the difficulty to detect drift in real applications. Therefore, the research contributions of this thesis include the development of a new approach for synthetic dataset generation, development of a new hyperparameter optimisation algorithm that reduces the search effort and the need of prior assumptions compared to existing methods, the analysis of the effects of randomised Neural Networks hyperparameters, and the development of a new ensemble algorithm based on bagging meta-model that reduces the computational effort over existing methods and uses an innovative updating mechanism to cope with concept drift. The algorithms have been tested on synthetic datasets and validated on four real-world datasets from various application domains
Masksembles for Uncertainty Estimation
Deep neural networks have amply demonstrated their prowess but estimating the
reliability of their predictions remains challenging. Deep Ensembles are widely
considered as being one of the best methods for generating uncertainty
estimates but are very expensive to train and evaluate. MC-Dropout is another
popular alternative, which is less expensive, but also less reliable. Our
central intuition is that there is a continuous spectrum of ensemble-like
models of which MC-Dropout and Deep Ensembles are extreme examples. The first
uses an effectively infinite number of highly correlated models while the
second relies on a finite number of independent models.
To combine the benefits of both, we introduce Masksembles. Instead of
randomly dropping parts of the network as in MC-dropout, Masksemble relies on a
fixed number of binary masks, which are parameterized in a way that allows to
change correlations between individual models. Namely, by controlling the
overlap between the masks and their density one can choose the optimal
configuration for the task at hand. This leads to a simple and easy to
implement method with performance on par with Ensembles at a fraction of the
cost. We experimentally validate Masksembles on two widely used datasets,
CIFAR10 and ImageNet
- …