575 research outputs found

    Clustering of nonstationary data streams: a survey of fuzzy partitional methods

    Get PDF
    YesData streams have arisen as a relevant research topic during the past decade. They are real‐time, incremental in nature, temporally ordered, massive, contain outliers, and the objects in a data stream may evolve over time (concept drift). Clustering is often one of the earliest and most important steps in the streaming data analysis workflow. A comprehensive literature is available about stream data clustering; however, less attention is devoted to the fuzzy clustering approach, even though the nonstationary nature of many data streams makes it especially appealing. This survey discusses relevant data stream clustering algorithms focusing mainly on fuzzy methods, including their treatment of outliers and concept drift and shift.Ministero dell‘Istruzione, dell‘Universitá e della Ricerca

    Unsupervised tracking of time-evolving data streams and an application to short-term urban traffic flow forecasting

    Get PDF
    I am indebted to many people for their help and support I receive during my Ph.D. study and research at DIBRIS-University of Genoa. First and foremost, I would like to express my sincere thanks to my supervisors Prof.Dr. Masulli, and Prof.Dr. Rovetta for the invaluable guidance, frequent meetings, and discussions, and the encouragement and support on my way of research. I thanks all the members of the DIBRIS for their support and kindness during my 4 years Ph.D. I would like also to acknowledge the contribution of the projects Piattaforma per la mobili\ue0 Urbana con Gestione delle INformazioni da sorgenti eterogenee (PLUG-IN) and COST Action IC1406 High Performance Modelling and Simulation for Big Data Applications (cHiPSet). Last and most importantly, I wish to thanks my family: my wife Shaimaa who stays with me through the joys and pains; my daughter and son whom gives me happiness every-day; and my parents for their constant love and encouragement

    Tracking time evolving data streams for short-term traffic forecasting

    Get PDF
    YesData streams have arisen as a relevant topic during the last few years as an efficient method for extracting knowledge from big data. In the robust layered ensemble model (RLEM) proposed in this paper for short-term traffic flow forecasting, incoming traffic flow data of all connected road links are organized in chunks corresponding to an optimal time lag. The RLEM model is composed of two layers. In the first layer, we cluster the chunks by using the Graded Possibilistic c-Means method. The second layer is made up by an ensemble of forecasters, each of them trained for short-term traffic flow forecasting on the chunks belonging to a specific cluster. In the operational phase, as a new chunk of traffic flow data presented as input to the RLEM, its memberships to all clusters are evaluated, and if it is not recognized as an outlier, the outputs of all forecasters are combined in an ensemble, obtaining in this a way a forecasting of traffic flow for a short-term time horizon. The proposed RLEM model is evaluated on a synthetic data set, on a traffic flow data simulator and on two real-world traffic flow data sets. The model gives an accurate forecasting of the traffic flow rates with outlier detection and shows a good adaptation to non-stationary traffic regimes. Given its characteristics of outlier detection, accuracy, and robustness, RLEM can be fruitfully integrated in traffic flow management systems

    Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: A Survey

    Get PDF
    Major assumptions in computational intelligence and machine learning consist of the availability of a historical dataset for model development, and that the resulting model will, to some extent, handle similar instances during its online operation. However, in many real world applications, these assumptions may not hold as the amount of previously available data may be insufficient to represent the underlying system, and the environment and the system may change over time. As the amount of data increases, it is no longer feasible to process data efficiently using iterative algorithms, which typically require multiple passes over the same portions of data. Evolving modeling from data streams has emerged as a framework to address these issues properly by self-adaptation, single-pass learning steps and evolution as well as contraction of model components on demand and on the fly. This survey focuses on evolving fuzzy rule-based models and neuro-fuzzy networks for clustering, classification and regression and system identification in online, real-time environments where learning and model development should be performed incrementally. (C) 2019 Published by Elsevier Inc.Igor Ơkrjanc, Jose Antonio Iglesias and Araceli Sanchis would like to thank to the Chair of Excellence of Universidad Carlos III de Madrid, and the Bank of Santander Program for their support. Igor Ơkrjanc is grateful to Slovenian Research Agency with the research program P2-0219, Modeling, simulation and control. Daniel Leite acknowledges the Minas Gerais Foundation for Research and Development (FAPEMIG), process APQ-03384-18. Igor Ơkrjanc and Edwin Lughofer acknowledges the support by the ”LCM — K2 Center for Symbiotic Mechatronics” within the framework of the Austrian COMET-K2 program. Fernando Gomide is grateful to the Brazilian National Council for Scientific and Technological Development (CNPq) for grant 305906/2014-3

    Evidential Evolving Gustafson-Kessel Algorithm For Online Data Streams Partitioning Using Belief Function Theory.

    Get PDF
    International audienceA new online clustering method called E2GK (Evidential Evolving Gustafson-Kessel) is introduced. This partitional clustering algorithm is based on the concept of credal partition defined in the theoretical framework of belief functions. A credal partition is derived online by applying an algorithm resulting from the adaptation of the Evolving Gustafson-Kessel (EGK) algorithm. Online partitioning of data streams is then possible with a meaningful interpretation of the data structure. A comparative study with the original online procedure shows that E2GK outperforms EGK on different entry data sets. To show the performance of E2GK, several experiments have been conducted on synthetic data sets as well as on data collected from a real application problem. A study of parameters' sensitivity is also carried out and solutions are proposed to limit complexity issues

    Early detection of health changes in the elderly using in-home multi-sensor data streams

    Get PDF
    The rapid aging of the population worldwide requires increased attention from health care providers and the entire society. For the elderly to live independently, many health issues related to old age, such as frailty and risk of falling, need increased attention and monitoring. When monitoring daily routines for older adults, it is desirable to detect the early signs of health changes before serious health events, such as hospitalizations, happen, so that timely and adequate preventive care may be provided. By deploying multi-sensor systems in homes of the elderly, we can track trajectories of daily behaviors in a feature space defined using the sensor data. In this work, we investigate a methodology for learning data distribution from streaming data and tracking the evolution of the behavior trajectories over long periods (years) using high dimensional streaming clustering and provide very early indicators of changes in health. If we assume that habitual behaviors correspond to clusters in feature space and diseases produce a change in behavior, albeit not highly specific, tracking trajectory deviations can provide hints of early illness. Retrospectively, we visualize the streaming clustering results and track how the behavior clusters evolve in feature space with the help of two dimension-reduction algorithms, Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE). Moreover, our tracking algorithm in the original high dimensional feature space generates early health warning alerts if a negative trend is detected in the behavior trajectory. We validated our algorithm on synthetic data, real-world data and tested it on a pilot dataset of four TigerPlace residents monitored with a collection of motion, bed, and depth sensors over ten years. We used the TigerPlace electronic health records (EHR) to understand the residents' behavior patterns and to evaluate and explain the health warnings generated by our algorithm. The results obtained on the TigerPlace dataset show that most of the warnings produced by our algorithm can be linked to health events documented in the EHR, providing strong support for a prospective deployment of the approach.Includes bibliographical references

    Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review

    Get PDF
    Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets” [335]. After only one year, the first effects of this seminal paper began to emerge, with the pioneering paper on clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of clustering algorithm based on the fuzzy sets theory
    • 

    corecore