1,594 research outputs found
Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: A Survey
Major assumptions in computational intelligence and machine learning consist of the availability of a historical dataset for model development, and that the resulting model will, to some extent, handle similar instances during its online operation. However, in many real world applications, these assumptions may not hold as the amount of previously available data may be insufficient to represent the underlying system, and the environment and the system may change over time. As the amount of data increases, it is no longer feasible to process data efficiently using iterative algorithms, which typically require multiple passes over the same portions of data. Evolving modeling from data streams has emerged as a framework to address these issues properly by self-adaptation, single-pass learning steps and evolution as well as contraction of model components on demand and on the fly. This survey focuses on evolving fuzzy rule-based models and neuro-fuzzy networks for clustering, classification and regression and system identification in online, real-time environments where learning and model development should be performed incrementally. (C) 2019 Published by Elsevier Inc.Igor Škrjanc, Jose Antonio Iglesias and Araceli Sanchis would like to thank to the Chair of Excellence of Universidad Carlos III de Madrid, and the Bank of Santander Program for their support. Igor Škrjanc is grateful to Slovenian Research Agency with the research program P2-0219, Modeling, simulation and control. Daniel Leite acknowledges the Minas Gerais Foundation for Research and Development (FAPEMIG), process APQ-03384-18. Igor Škrjanc and Edwin Lughofer acknowledges the support by the ”LCM — K2 Center for Symbiotic Mechatronics” within the framework of the Austrian COMET-K2 program. Fernando Gomide is grateful to the Brazilian National Council for Scientific and Technological Development (CNPq) for grant
305906/2014-3
Evolving Ensemble Fuzzy Classifier
The concept of ensemble learning offers a promising avenue in learning from
data streams under complex environments because it addresses the bias and
variance dilemma better than its single model counterpart and features a
reconfigurable structure, which is well suited to the given context. While
various extensions of ensemble learning for mining non-stationary data streams
can be found in the literature, most of them are crafted under a static base
classifier and revisits preceding samples in the sliding window for a
retraining step. This feature causes computationally prohibitive complexity and
is not flexible enough to cope with rapidly changing environments. Their
complexities are often demanding because it involves a large collection of
offline classifiers due to the absence of structural complexities reduction
mechanisms and lack of an online feature selection mechanism. A novel evolving
ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in
this paper. pENsemble differs from existing architectures in the fact that it
is built upon an evolving classifier from data streams, termed Parsimonious
Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism,
which estimates a localized generalization error of a base classifier. A
dynamic online feature selection scenario is integrated into the pENsemble.
This method allows for dynamic selection and deselection of input features on
the fly. pENsemble adopts a dynamic ensemble structure to output a final
classification decision where it features a novel drift detection scenario to
grow the ensemble structure. The efficacy of the pENsemble has been numerically
demonstrated through rigorous numerical studies with dynamic and evolving data
streams where it delivers the most encouraging performance in attaining a
tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System
Unsupervised tracking of time-evolving data streams and an application to short-term urban traffic flow forecasting
I am indebted to many people for their help and support I receive during my Ph.D. study and research at DIBRIS-University of Genoa. First and foremost, I would like to express my sincere thanks to my supervisors Prof.Dr. Masulli, and Prof.Dr. Rovetta for the invaluable guidance, frequent meetings, and discussions, and the encouragement and support on my way of research. I thanks all the members of the DIBRIS for their support and kindness during my 4 years Ph.D. I would like also to acknowledge the contribution of the projects Piattaforma per la mobili\ue0 Urbana con Gestione delle INformazioni da sorgenti eterogenee (PLUG-IN) and COST Action IC1406 High Performance Modelling and Simulation for Big Data Applications (cHiPSet). Last and most importantly, I wish to thanks my family: my wife Shaimaa who stays with me through the joys and pains; my daughter and son whom gives me happiness every-day; and my parents for their constant love and encouragement
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Clustering of nonstationary data streams: a survey of fuzzy partitional methods
YesData streams have arisen as a relevant research topic during the past decade. They are real‐time, incremental in nature, temporally ordered, massive, contain outliers, and the objects in a data stream may evolve over time (concept drift). Clustering is often one of the earliest and most important steps in the streaming data analysis workflow. A comprehensive literature is available about stream data clustering; however, less attention is devoted to the fuzzy clustering approach, even though the nonstationary nature of many data streams makes it especially appealing. This survey discusses relevant data stream clustering algorithms focusing mainly on fuzzy methods, including their treatment of outliers and concept drift and shift.Ministero dell‘Istruzione, dell‘Universitá e della Ricerca
Scalable Teacher Forcing Network for Semi-Supervised Large Scale Data Streams
The large-scale data stream problem refers to high-speed information flow
which cannot be processed in scalable manner under a traditional computing
platform. This problem also imposes expensive labelling cost making the
deployment of fully supervised algorithms unfeasible. On the other hand, the
problem of semi-supervised large-scale data streams is little explored in the
literature because most works are designed in the traditional single-node
computing environments while also being fully supervised approaches. This paper
offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to
cope with the scarcity of labelled samples and the large-scale data streams
simultaneously. WeScatterNet is crafted under distributed computing platform of
Apache Spark with a data-free model fusion strategy for model compression after
parallel computing stage. It features an open network structure to address the
global and local drift problems while integrating a data augmentation,
annotation and auto-correction () method for handling partially labelled
data streams. The performance of WeScatterNet is numerically evaluated in the
six large-scale data stream problems with only label proportions. It
shows highly competitive performance even if compared with fully supervised
learners with label proportions.Comment: This paper has been accepted for publication in Information Science
- …