1,520 research outputs found
Adaptive Online Sequential ELM for Concept Drift Tackling
A machine learning method needs to adapt to over time changes in the
environment. Such changes are known as concept drift. In this paper, we propose
concept drift tackling method as an enhancement of Online Sequential Extreme
Learning Machine (OS-ELM) and Constructive Enhancement OS-ELM (CEOS-ELM) by
adding adaptive capability for classification and regression problem. The
scheme is named as adaptive OS-ELM (AOS-ELM). It is a single classifier scheme
that works well to handle real drift, virtual drift, and hybrid drift. The
AOS-ELM also works well for sudden drift and recurrent context change type. The
scheme is a simple unified method implemented in simple lines of code. We
evaluated AOS-ELM on regression and classification problem by using concept
drift public data set (SEA and STAGGER) and other public data sets such as
MNIST, USPS, and IDS. Experiments show that our method gives higher kappa value
compared to the multiclassifier ELM ensemble. Even though AOS-ELM in practice
does not need hidden nodes increase, we address some issues related to the
increasing of the hidden nodes such as error condition and rank values. We
propose taking the rank of the pseudoinverse matrix as an indicator parameter
to detect underfitting condition.Comment: Hindawi Publishing. Computational Intelligence and Neuroscience
Volume 2016 (2016), Article ID 8091267, 17 pages Received 29 January 2016,
Accepted 17 May 2016. Special Issue on "Advances in Neural Networks and
Hybrid-Metaheuristics: Theory, Algorithms, and Novel Engineering
Applications". Academic Editor: Stefan Hauf
Learning from Data Streams with Randomized Forests
Non-stationary streaming data poses a familiar challenge in machine learning: the need to
obtain fast and accurate predictions. A data stream is a continuously generated sequence of
data, with data typically arriving rapidly. They are often characterised by a non-stationary
generative process, with concept drift occurring as the process changes. Such processes are
commonly seen in the real world, such as in advertising, shopping trends, environmental
conditions, electricity monitoring and traffic monitoring.
Typical stationary algorithms are ill-suited for use with concept drifting data, thus necessitating
more targeted methods. Tree-based methods are a popular approach to this problem,
traditionally focussing on the use of the Hoeffding bound in order to guarantee performance
relative to a stationary scenario. However, there are limited single learners available for
regression scenarios, and those that do exist often struggle to choose between similarly
discriminative splits, leading to longer training times and worse performance. This limited
pool of single learners in turn hampers the performance of ensemble approaches in which
they act as base learners.
In this thesis we seek to remedy this gap in the literature, developing methods which
focus on increasing randomization to both improve predictive performance and reduce the
training times of tree-based ensemble methods. In particular, we have chosen to investigate
the use of randomization as it is known to be able to improve generalization error in
ensembles, and is also expected to lead to fast training times, thus being a natural method
of handling the problems typically experienced by single learners.
We begin in a regression scenario, introducing the Adaptive Trees for Streaming with
Extreme Randomization (ATSER) algorithm; a partially randomized approach based on
the concept of Extremely Randomized (extra) trees. The ATSER algorithm incrementally
trains trees, using the Hoeffding bound to select the best of a random selection of splits.
Simultaneously, the trees also detect and adapt to changes in the data stream. Unlike many
traditional streaming algorithms ATSER trees can easily be extended to include nominal
features. We find that compared to other contemporary methods ensembles of ATSER
trees lead to improved predictive performance whilst also reducing run times.
We then demonstrate the Adaptive Categorisation Trees for Streaming with Extreme
Randomization (ACTSER) algorithm, an adaption of the ATSER algorithm to the more
traditional categorization scenario, again showing improved predictive performance and
reduced runtimes. The inclusion of nominal features is particularly novel in this setting
since typical categorization approaches struggle to handle them.
Finally we examine a completely randomized scenario, where an ensemble of trees is generated
prior to having access to the data stream, while also considering multivariate splits
in addition to the traditional axis-aligned approach. We find that through the combination
of a forgetting mechanism in linear models and dynamic weighting for ensemble members,
we are able to avoid explicitly testing for concept drift. This leads to fast ensembles
with strong predictive performance, whilst also requiring fewer parameters than other
contemporary methods.
For each of the proposed methods in this thesis, we demonstrate empirically that they are
effective over a variety of different non-stationary data streams, including on multiple
types of concept drift. Furthermore, in comparison to other contemporary data streaming
algorithms, we find the biggest improvements in performance are on noisy data streams.Engineers Gat
How to Cope with Change? - Preserving Validity of Predictive Services over Time
Companies more and more rely on predictive services which are constantly monitoring and analyzing the available data streams for better service offerings. However, sudden or incremental changes in those streams are a challenge for the validity and proper functionality of the predictive service over time. We develop a framework which allows to characterize and differentiate predictive services with regard to their ongoing validity. Furthermore, this work proposes a research agenda of worthwhile research topics to improve the long-term validity of predictive services. In our work, we especially focus on different scenarios of true label availability for predictive services as well as the integration of expert knowledge. With these insights at hand, we lay an important foundation for future research in the field of valid predictive services
Adaptive estimation and change detection of correlation and quantiles for evolving data streams
Streaming data processing is increasingly playing a central role in enterprise data architectures due to an abundance of available measurement data from a wide variety of sources and advances in data capture and infrastructure technology. Data streams arrive, with high frequency, as never-ending sequences of events, where the underlying data generating process always has the potential to evolve. Business operations often demand real-time processing of data streams for keeping models up-to-date and timely decision-making. For example in cybersecurity contexts, analysing streams of network data can aid the detection of potentially malicious behaviour. Many tools for statistical inference cannot meet the challenging demands of streaming data, where the computational cost of updates to models must be constant to ensure continuous processing as data scales. Moreover, these tools are often not capable of adapting to changes, or drift, in the data. Thus, new tools for modelling data streams with efficient data processing and model updating capabilities, referred to as streaming analytics, are required. Regular intervention for control parameter configuration is prohibitive to the truly continuous processing constraints of streaming data. There is a notable absence of such tools designed with both temporal-adaptivity to accommodate drift and the autonomy to not rely on control parameter tuning. Streaming analytics with these properties can be developed using an Adaptive Forgetting (AF) framework, with roots in adaptive filtering. The fundamental contributions of this thesis are to extend the streaming toolkit by using the AF framework to develop autonomous and temporally-adaptive streaming analytics.
The first contribution uses the AF framework to demonstrate the development of a model, and validation procedure, for estimating time-varying parameters of bivariate data streams from cyber-physical systems. This is accompanied by a novel continuous monitoring change detection system that compares adaptive and non-adaptive estimates. The second contribution is the development of a streaming analytic for the correlation coefficient and an associated change detector to monitor changes to correlation structures across streams. This is demonstrated on cybersecurity network data. The third contribution is a procedure for estimating time-varying binomial data with thorough exploration of the nuanced behaviour of this estimator. The final contribution is a framework to enhance extant streaming quantile estimators with autonomous, temporally-adaptive properties. In addition, a novel streaming quantile procedure is developed and demonstrated, in an extensive simulation study, to show appealing performance.Open Acces
- …