5,554 research outputs found
A reduced labeled samples (RLS) framework for classification of imbalanced concept-drifting streaming data.
Stream processing frameworks are designed to process the streaming data that arrives in time. An example of such data is stream of emails that a user receives every day. Most of the real world data streams are also imbalanced as is in the stream of emails, which contains few spam emails compared to a lot of legitimate emails. The classification of the imbalanced data stream is challenging due to the several reasons: First of all, data streams are huge and they can not be stored in the memory for one time processing. Second, if the data is imbalanced, the accuracy of the majority class mostly dominates the results. Third, data streams are changing over time, and that causes degradation in the model performance. Hence the model should get updated when such changes are detected. Finally, the true labels of the all samples are not available immediately after classification, and only a fraction of the data is possible to get labeled in real world applications. That is because the labeling is expensive and time consuming. In this thesis, a framework for modeling the streaming data when the classes of the data samples are imbalanced is proposed. This framework is called Reduced Labeled Samples (RLS). RLS is a chunk based learning framework that builds a model using partially labeled data stream, when the characteristics of the data change. In RLS, a fraction of the samples are labeled and are used in modeling, and the performance is not significantly different from that of the 100% labeling. RLS maintains an ensemble of classifiers to boost the performance. RLS uses the information from labeled data in a supervised fashion, and also is extended to use the information from unlabeled data in a semi supervised fashion. RLS addresses both binary and multi class partially labeled data stream and the results show the basis of RLS is effective even in the context of multi class classification problems. Overall, the RLS is shown to be an effective framework for processing imbalanced and partially labeled data streams
Self-Organizing Fuzzy Inference Ensemble System for Big Streaming Data Classification
An evolving intelligent system (EIS) is able to self-update its system structure and meta-parameters from streaming data. However, since the majority of EISs are implemented on a single-model architecture, their performances on large-scale, complex data streams are often limited. To address this deficiency, a novel self-organizing fuzzy inference ensemble framework is proposed in this paper. As the base learner of the proposed ensemble system, the self-organizing fuzzy inference system is capable of self-learning a highly transparent predictive model from streaming data on a chunk-by-chunk basis through a human-interpretable process. Very importantly, the base learner can continuously self-adjust its decision boundaries based on the inter-class and intra-class distances between prototypes identified from successive data chunks for higher classification precision. Thanks to its parallel distributed computing architecture, the proposed ensemble framework can achieve great classification precision while maintain high computational efficiency on large-scale problems. Numerical examples based on popular benchmark big data problems demonstrate the superior performance of the proposed approach over the state-of-the-art alternatives in terms of both classification accuracy and computational efficiency
A survey on machine learning for recurring concept drifting data streams
The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time
An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams
Existing FNNs are mostly developed under a shallow network configuration
having lower generalization power than those of deep structures. This paper
proposes a novel self-organizing deep FNN, namely DEVFNN. Fuzzy rules can be
automatically extracted from data streams or removed if they play limited role
during their lifespan. The structure of the network can be deepened on demand
by stacking additional layers using a drift detection method which not only
detects the covariate drift, variations of input space, but also accurately
identifies the real drift, dynamic changes of both feature space and target
space. DEVFNN is developed under the stacked generalization principle via the
feature augmentation concept where a recently developed algorithm, namely
gClass, drives the hidden layer. It is equipped by an automatic feature
selection method which controls activation and deactivation of input attributes
to induce varying subsets of input features. A deep network simplification
procedure is put forward using the concept of hidden layer merging to prevent
uncontrollable growth of dimensionality of input space due to the nature of
feature augmentation approach in building a deep network structure. DEVFNN
works in the sample-wise fashion and is compatible for data stream
applications. The efficacy of DEVFNN has been thoroughly evaluated using seven
datasets with non-stationary properties under the prequential test-then-train
protocol. It has been compared with four popular continual learning algorithms
and its shallow counterpart where DEVFNN demonstrates improvement of
classification accuracy. Moreover, it is also shown that the concept drift
detection method is an effective tool to control the depth of network structure
while the hidden layer merging scenario is capable of simplifying the network
complexity of a deep network with negligible compromise of generalization
performance.Comment: This paper has been published in IEEE Transactions on Fuzzy System
Neural Dynamics of Autistic Behaviors: Cognitive, Emotional, and Timing Substrates
What brain mechanisms underlie autism and how do they give rise to autistic behavioral symptoms? This article describes a neural model, called the iSTART model, which proposes how cognitive, emotional, timing, and motor processes may interact together to create and perpetuate autistic symptoms. These model processes were originally developed to explain data concerning how the brain controls normal behaviors. The iSTART model shows how autistic behavioral symptoms may arise from prescribed breakdowns in these brain processes.Air Force Office of Scientific Research (F49620-01-1-0397); Office of Naval Research (N00014-01-1-0624
- …