6,690 research outputs found
Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams
The last decade has seen a surge of interest in adaptive learning algorithms
for data stream classification, with applications ranging from predicting ozone
level peaks, learning stock market indicators, to detecting computer security
violations. In addition, a number of methods have been developed to detect
concept drifts in these streams. Consider a scenario where we have a number of
classifiers with diverse learning styles and different drift detectors.
Intuitively, the current 'best' (classifier, detector) pair is application
dependent and may change as a result of the stream evolution. Our research
builds on this observation. We introduce the \mbox{Tornado} framework that
implements a reservoir of diverse classifiers, together with a variety of drift
detection algorithms. In our framework, all (classifier, detector) pairs
proceed, in parallel, to construct models against the evolving data streams. At
any point in time, we select the pair which currently yields the best
performance. We further incorporate two novel stacking-based drift detection
methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The
experimental evaluation confirms that the current 'best' (classifier, detector)
pair is not only heavily dependent on the characteristics of the stream, but
also that this selection evolves as the stream flows. Further, our
\mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion
while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure
Adaptive Online Sequential ELM for Concept Drift Tackling
A machine learning method needs to adapt to over time changes in the
environment. Such changes are known as concept drift. In this paper, we propose
concept drift tackling method as an enhancement of Online Sequential Extreme
Learning Machine (OS-ELM) and Constructive Enhancement OS-ELM (CEOS-ELM) by
adding adaptive capability for classification and regression problem. The
scheme is named as adaptive OS-ELM (AOS-ELM). It is a single classifier scheme
that works well to handle real drift, virtual drift, and hybrid drift. The
AOS-ELM also works well for sudden drift and recurrent context change type. The
scheme is a simple unified method implemented in simple lines of code. We
evaluated AOS-ELM on regression and classification problem by using concept
drift public data set (SEA and STAGGER) and other public data sets such as
MNIST, USPS, and IDS. Experiments show that our method gives higher kappa value
compared to the multiclassifier ELM ensemble. Even though AOS-ELM in practice
does not need hidden nodes increase, we address some issues related to the
increasing of the hidden nodes such as error condition and rank values. We
propose taking the rank of the pseudoinverse matrix as an indicator parameter
to detect underfitting condition.Comment: Hindawi Publishing. Computational Intelligence and Neuroscience
Volume 2016 (2016), Article ID 8091267, 17 pages Received 29 January 2016,
Accepted 17 May 2016. Special Issue on "Advances in Neural Networks and
Hybrid-Metaheuristics: Theory, Algorithms, and Novel Engineering
Applications". Academic Editor: Stefan Hauf
Learning Mazes with Aliasing States: An LCS Algorithm with Associative Perception
Learning classifier systems (LCSs) belong to a class of algorithms based on the principle of self-organization and have frequently been applied to the task of solving mazes, an important type of reinforcement learning (RL) problem. Maze problems represent a simplified virtual model of real environments that can be used for developing core algorithms of many real-world applications related to the problem of navigation. However, the best achievements of LCSs in maze problems are still mostly bounded to non-aliasing environments, while LCS complexity seems to obstruct a proper analysis of the reasons of failure. We construct a new LCS agent that has a simpler and more transparent performance mechanism, but that can still solve mazes better than existing algorithms. We use the structure of a predictive LCS model, strip out the evolutionary mechanism, simplify the reinforcement learning procedure and equip the agent with the ability of associative perception, adopted from psychology. To improve our understanding of the nature and structure of maze environments, we analyze mazes used in research for the last two decades, introduce a set of maze complexity characteristics, and develop a set of new maze environments. We then run our new LCS with associative perception through the old and new aliasing mazes, which represent partially observable Markov decision problems (POMDP) and demonstrate that it performs at least as well as, and in some cases better than, other published systems
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
This work is motivated by the needs of predictive analytics on healthcare
data as represented by Electronic Medical Records. Such data is invariably
problematic: noisy, with missing entries, with imbalance in classes of
interests, leading to serious bias in predictive modeling. Since standard data
mining methods often produce poor performance measures, we argue for
development of specialized techniques of data-preprocessing and classification.
In this paper, we propose a new method to simultaneously classify large
datasets and reduce the effects of missing values. It is based on a multilevel
framework of the cost-sensitive SVM and the expected maximization imputation
method for missing values, which relies on iterated regression analyses. We
compare classification results of multilevel SVM-based algorithms on public
benchmark datasets with imbalanced classes and missing values as well as real
data in health applications, and show that our multilevel SVM-based method
produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625
- …