79 research outputs found

    Robust techniques and applications in fuzzy clustering

    Get PDF
    This dissertation addresses issues central to frizzy classification. The issue of sensitivity to noise and outliers of least squares minimization based clustering techniques, such as Fuzzy c-Means (FCM) and its variants is addressed. In this work, two novel and robust clustering schemes are presented and analyzed in detail. They approach the problem of robustness from different perspectives. The first scheme scales down the FCM memberships of data points based on the distance of the points from the cluster centers. Scaling done on outliers reduces their membership in true clusters. This scheme, known as the Mega-clustering, defines a conceptual mega-cluster which is a collective cluster of all data points but views outliers and good points differently (as opposed to the concept of Dave\u27s Noise cluster). The scheme is presented and validated with experiments and similarities with Noise Clustering (NC) are also presented. The other scheme is based on the feasible solution algorithm that implements the Least Trimmed Squares (LTS) estimator. The LTS estimator is known to be resistant to noise and has a high breakdown point. The feasible solution approach also guarantees convergence of the solution set to a global optima. Experiments show the practicability of the proposed schemes in terms of computational requirements and in the attractiveness of their simplistic frameworks. The issue of validation of clustering results has often received less attention than clustering itself. Fuzzy and non-fuzzy cluster validation schemes are reviewed and a novel methodology for cluster validity using a test for random position hypothesis is developed. The random position hypothesis is tested against an alternative clustered hypothesis on every cluster produced by the partitioning algorithm. The Hopkins statistic is used as a basis to accept or reject the random position hypothesis, which is also the null hypothesis in this case. The Hopkins statistic is known to be a fair estimator of randomness in a data set. The concept is borrowed from the clustering tendency domain and its applicability to validating clusters is shown here. A unique feature selection procedure for use with large molecular conformational datasets with high dimensionality is also developed. The intelligent feature extraction scheme not only helps in reducing dimensionality of the feature space but also helps in eliminating contentious issues such as the ones associated with labeling of symmetric atoms in the molecule. The feature vector is converted to a proximity matrix, and is used as an input to the relational fuzzy clustering (FRC) algorithm with very promising results. Results are also validated using several cluster validity measures from literature. Another application of fuzzy clustering considered here is image segmentation. Image analysis on extremely noisy images is carried out as a precursor to the development of an automated real time condition state monitoring system for underground pipelines. A two-stage FCM with intelligent feature selection is implemented as the segmentation procedure and results on a test image are presented. A conceptual framework for automated condition state assessment is also developed

    Unsupervised tracking of time-evolving data streams and an application to short-term urban traffic flow forecasting

    Get PDF
    I am indebted to many people for their help and support I receive during my Ph.D. study and research at DIBRIS-University of Genoa. First and foremost, I would like to express my sincere thanks to my supervisors Prof.Dr. Masulli, and Prof.Dr. Rovetta for the invaluable guidance, frequent meetings, and discussions, and the encouragement and support on my way of research. I thanks all the members of the DIBRIS for their support and kindness during my 4 years Ph.D. I would like also to acknowledge the contribution of the projects Piattaforma per la mobili\ue0 Urbana con Gestione delle INformazioni da sorgenti eterogenee (PLUG-IN) and COST Action IC1406 High Performance Modelling and Simulation for Big Data Applications (cHiPSet). Last and most importantly, I wish to thanks my family: my wife Shaimaa who stays with me through the joys and pains; my daughter and son whom gives me happiness every-day; and my parents for their constant love and encouragement

    Temporal decision making using unsupervised learning

    Get PDF
    With the explosion of ubiquitous continuous sensing, on-line streaming clustering continues to attract attention. The requirements are that the streaming clustering algorithm recognize and adapt clusters as the data evolves, that anomalies are detected, and that new clusters are automatically formed as incoming data dictate. In this dissertation, we develop a streaming clustering algorithm, MU Streaming Clustering (MUSC), that is based on coupling a Gaussian mixture model (GMM) with possibilistic clustering to build an adaptive system for analyzing streaming multi-dimensional activity feature vectors. For this reason, the possibilistic C-Means (PCM) and Automatic Merging Possibilistic Clustering Method (AMPCM) are combined together to cluster the initial data points, detect anomalies and initialize the GMM. MUSC achieves our goals when tested on synthetic and real-life datasets. We also compare MUSC's performance with Sequential k-means (sk-means), Basic Sequential Clustering Algorithm (BSAS), and Modified BSAS (MBSAS) here MUSC shows superiority in the performance and accuracy. The performance of a streaming clustering algorithm needs to be monitored over time to understand the behavior of the streaming data in terms of new emerging clusters and number of outlier data points. Incremental internal Validity Indices (iCVIs) are used to monitor the performance of an on-line clustering algorithm. We study the internal incremental Davies-Bouldin (DB), Xie-Beni (XB), and Dunn internal cluster validity indices in the context of streaming data analysis. We extend the original incremental DB (iDB) to a more general version parameterized by the exponent of membership weights. Then we illustrate how the iDB can be used to analyze and understand the performance of MUSC algorithm. We give examples that illustrate the appearance of a new cluster, the effect of different cluster sizes, handling of outlier data samples, and the effect of the input order on the resultant cluster history. In addition, we investigate the internal incremental Davies-Bouldin (iDB) cluster validity index in the context of big streaming data analysis. We analyze the effect of large numbers of samples on the values of the iCVI (iDB). We also develop online versions of two modified generalized Dunn's indices that can be used for dynamic evaluation of evolving (cluster) structure in streaming data. We argue that this method is a good way to monitor the ongoing performance of online clustering algorithms and we illustrate several types of inferences that can be drawn from such indices. We compare the two new indices to the incremental Xie-Beni and Davies-Bouldin indices, which to our knowledge offer the only comparable approach, with numerical examples on a variety of synthetic and real data sets. We also study the performance of MUSC and iCVIs with big streaming data applications. We show the advantage of iCVIs in monitoring large streaming datasets and in providing useful information about the data stream in terms of emergence of a new structure, amount of outlier data, size of the clusters, and order of data samples in each cluster. We also propose a way to project streaming data into a lower space for cases where the distance measure does not perform as expected in the high dimensional space. Another example of streaming is the data acivity data coming from TigerPlace and other elderly residents' apartments in and around Columbia. MO. TigerPlace is an eldercare facility that promotes aging-in-place in Columbia Missouri. Eldercare monitoring using non-wearable sensors is a candidate solution for improving care and reducing costs. Abnormal sensor patterns produced by certain resident behaviors could be linked to early signs of illness. We propose an unsupervised method for detecting abnormal behavior patterns based on a new context preserving representation of daily activities. A preliminary analysis of the method was conducted on data collected in TigerPlace. Sensor firings of each day are converted into sequences of daily activities. Then, building a histogram from the daily sequences of a resident, we generate a single data vector representing that day. Using the proposed method, a day with hundreds of sequences is converted into a single data point representing that day and preserving the context of the daily routine at the same time. We obtained an average Area Under the Curve (AUC) of 0.9 in detecting days where elder adults need to be assessed. Our approach outperforms other approaches on the same datset. Using the context preserving representation, we develoed a multi-dimensional alert system to improve the existing single-dimensional alert system in TigerPlace. Also, this represenation is used to develop a framework that utilizes sensor sequence similarity and medical concepts extracted from the EHR to automatically inform the nursing staff when health problems are detected. Our context preserving representation of daily activities is used to measure the similarity between the sensor sequences of different days. The medical concepts are extracted from the nursing notes using MetamapLite, an NLP tool included in the Unified Medical Language System (UMLS). The proposed idea is validated on two pilot datasets from twelve Tiger Place residents, with a total of 5810 sensor days out of which 1966 had nursing notes

    Advanced Signal Processing and Control in Anaesthesia

    Get PDF
    This thesis comprises three major stages: classification of depth of anaesthesia (DOA); modelling a typical patient’s behaviour during a surgical procedure; and control of DOAwith simultaneous administration of propofol and remifentanil. Clinical data gathered in theoperating theatre was used in this project. Multiresolution wavelet analysis was used to extract meaningful features from the auditory evoked potentials (AEP). These features were classified into different DOA levels using a fuzzy relational classifier (FRC). The FRC uses fuzzy clustering and fuzzy relational composition. The FRC had a good performance and was able to distinguish between the DOA levels. A hybrid patient model was developed for the induction and maintenance phase of anaesthesia. An adaptive network-based fuzzy inference system was used to adapt Takagi-Sugeno-Kang (TSK) fuzzy models relating systolic arterial pressure (SAP), heart rate (HR), and the wavelet extracted AEP features with the effect concentrations of propofol and remifentanil. The effect of surgical stimuli on SAP and HR, and the analgesic properties of remifentanil were described by Mamdani fuzzy models, constructed with anaesthetist cooperation. The model proved to be adequate, reflecting the effect of drugs and surgical stimuli. A multivariable fuzzy controller was developed for the simultaneous administration of propofol and remifentanil. The controller is based on linguistic rules that interact with three decision tables, one of which represents a fuzzy PI controller. The infusion rates of the two drugs are determined according to the DOA level and surgical stimulus. Remifentanil is titrated according to the required analgesia level and its synergistic interaction with propofol. The controller was able to adequately achieve and maintain the target DOA level, under different conditions. Overall, it was possible to model the interaction between propofol and remifentanil, and to successfully use this model to develop a closed-loop system in anaesthesia

    Integration of Auxiliary Data Knowledge in Prototype Based Vector Quantization and Classification Models

    Get PDF
    This thesis deals with the integration of auxiliary data knowledge into machine learning methods especially prototype based classification models. The problem of classification is diverse and evaluation of the result by using only the accuracy is not adequate in many applications. Therefore, the classification tasks are analyzed more deeply. Possibilities to extend prototype based methods to integrate extra knowledge about the data or the classification goal is presented to obtain problem adequate models. One of the proposed extensions is Generalized Learning Vector Quantization for direct optimization of statistical measurements besides the classification accuracy. But also modifying the metric adaptation of the Generalized Learning Vector Quantization for functional data, i. e. data with lateral dependencies in the features, is considered.:Symbols and Abbreviations 1 Introduction 1.1 Motivation and Problem Description . . . . . . . . . . . . . . . . . 1 1.2 Utilized Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Prototype Based Methods 19 2.1 Unsupervised Vector Quantization . . . . . . . . . . . . . . . . . . 22 2.1.1 C-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.1.2 Self-Organizing Map . . . . . . . . . . . . . . . . . . . . . . 25 2.1.3 Neural Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.1.4 Common Generalizations . . . . . . . . . . . . . . . . . . . 30 2.2 Supervised Vector Quantization . . . . . . . . . . . . . . . . . . . . 35 2.2.1 The Family of Learning Vector Quantizers - LVQ . . . . . . 36 2.2.2 Generalized Learning Vector Quantization . . . . . . . . . 38 2.3 Semi-Supervised Vector Quantization . . . . . . . . . . . . . . . . 42 2.3.1 Learning Associations by Self-Organization . . . . . . . . . 42 2.3.2 Fuzzy Labeled Self-Organizing Map . . . . . . . . . . . . . 43 2.3.3 Fuzzy Labeled Neural Gas . . . . . . . . . . . . . . . . . . 45 2.4 Dissimilarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.4.1 Differentiable Kernels in Generalized LVQ . . . . . . . . . 52 2.4.2 Dissimilarity Adaptation for Performance Improvement . 56 3 Deeper Insights into Classification Problems - From the Perspective of Generalized LVQ- 81 3.1 Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.2 The Classification Task . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.3 Evaluation of Classification Results . . . . . . . . . . . . . . . . . . 88 3.4 The Classification Task as an Ill-Posed Problem . . . . . . . . . . . 92 4 Auxiliary Structure Information and Appropriate Dissimilarity Adaptation in Prototype Based Methods 93 4.1 Supervised Vector Quantization for Functional Data . . . . . . . . 93 4.1.1 Functional Relevance/Matrix LVQ . . . . . . . . . . . . . . 95 4.1.2 Enhancement Generalized Relevance/Matrix LVQ . . . . 109 4.2 Fuzzy Information About the Labels . . . . . . . . . . . . . . . . . 121 4.2.1 Fuzzy Semi-Supervised Self-Organizing Maps . . . . . . . 122 4.2.2 Fuzzy Semi-Supervised Neural Gas . . . . . . . . . . . . . 123 5 Variants of Classification Costs and Class Sensitive Learning 137 5.1 Border Sensitive Learning in Generalized LVQ . . . . . . . . . . . 137 5.1.1 Border Sensitivity by Additive Penalty Function . . . . . . 138 5.1.2 Border Sensitivity by Parameterized Transfer Function . . 139 5.2 Optimizing Different Validation Measures by the Generalized LVQ 147 5.2.1 Attention Based Learning Strategy . . . . . . . . . . . . . . 148 5.2.2 Optimizing Statistical Validation Measurements for Binary Class Problems in the GLVQ . . . . . . . . . . . . . 155 5.3 Integration of Structural Knowledge about the Labeling in Fuzzy Supervised Neural Gas . . . . . . . . . . . . . . . . . . . . . . . . . 160 6 Conclusion and Future Work 165 My Publications 168 A Appendix 173 A.1 Stochastic Gradient Descent (SGD) . . . . . . . . . . . . . . . . . . 173 A.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . 175 A.3 Fuzzy Supervised Neural Gas Algorithm Solved by SGD . . . . . 179 Bibliography 182 Acknowledgements 20

    Proceedings of the 5th International Workshop "What can FCA do for Artificial Intelligence?", FCA4AI 2016(co-located with ECAI 2016, The Hague, Netherlands, August 30th 2016)

    Get PDF
    International audienceThese are the proceedings of the fifth edition of the FCA4AI workshop (http://www.fca4ai.hse.ru/). Formal Concept Analysis (FCA) is a mathematically well-founded theory aimed at data analysis and classification that can be used for many purposes, especially for Artificial Intelligence (AI) needs. The objective of the FCA4AI workshop is to investigate two main main issues: how can FCA support various AI activities (knowledge discovery, knowledge representation and reasoning, learning, data mining, NLP, information retrieval), and how can FCA be extended in order to help AI researchers to solve new and complex problems in their domain. Accordingly, topics of interest are related to the following: (i) Extensions of FCA for AI: pattern structures, projections, abstractions. (ii) Knowledge discovery based on FCA: classification, data mining, pattern mining, functional dependencies, biclustering, stability, visualization. (iii) Knowledge processing based on concept lattices: modeling, representation, reasoning. (iv) Application domains: natural language processing, information retrieval, recommendation, mining of web of data and of social networks, etc
    • …
    corecore