11 research outputs found

    On the consistency of a spatial-type interval-valued median for random intervals

    Full text link
    The sample dθd_\theta-median is a robust estimator of the central tendency or location of an interval-valued random variable. While the interval-valued sample mean can be highly influenced by outliers, this spatial-type interval-valued median remains much more reliable. In this paper, we show that under general conditions the sample dθd_\theta-median is a strongly consistent estimator of the dθd_\theta-median of an interval-valued random variable.Comment: 14 page

    Entropy-based fuzzy clustering of interval-valued time series

    Get PDF
    This paper proposes a fuzzy C-medoids-based clustering method with entropy regularization to solve the issue of grouping complex data as interval-valued time series. The dual nature of the data, that are both time-varying and interval-valued, needs to be considered and embedded into clustering techniques. In this work, a new dissimilarity measure, based on Dynamic Time Warping, is proposed. The performance of the new clustering procedure is evaluated through a simulation study and an application to financial time series

    Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review

    Get PDF
    Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets” [335]. After only one year, the first effects of this seminal paper began to emerge, with the pioneering paper on clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of clustering algorithm based on the fuzzy sets theory

    Fuzzy C-ordered medoids clustering of interval-valued data

    Get PDF
    Fuzzy clustering for interval-valued data helps us to find natural vague boundaries in such data. The Fuzzy c-Medoids Clustering (FcMdC) method is one of the most popular clustering methods based on a partitioning around medoids approach. However, one of the greatest disadvantages of this method is its sensitivity to the presence of outliers in data. This paper introduces a new robust fuzzy clustering method named Fuzzy c-Ordered-Medoids clustering for interval-valued data (FcOMdC-ID). The Huber's M-estimators and the Yager's Ordered Weighted Averaging (OWA) operators are used in the method proposed to make it robust to outliers. The described algorithm is compared with the fuzzy c-medoids method in the experiments performed on synthetic data with different types of outliers. A real application of the FcOMdC-ID is also provided

    A spatial-type interval-valued median for random intervals

    Get PDF
    © 2018 Informa UK Limited, trading as Taylor & Francis Group. To estimate the central tendency or location of a sample of interval-valued data, a standard statistic is the interval-valued sample mean. Its strong sensitivity to outliers or data changes motivates the search for more robust alternatives. In this respect, a more robust location statistic is studied in this paper. This measure is inspired by the concept of spatial median and makes use of the versatile generalized Bertoluzza's metric between intervals, the so-called dθ distance. The problem of minimizing the mean dθ distance to the values the random interval takes, which defines the spatial-type dθ-median, is analysed. Existence and uniqueness of the sample version are shown. Furthermore, the robustness of this proposal is investigated by deriving its finite sample breakdown point. Finally, a real-life example from the Economics field illustrates the robustness of the sample dθ-median, and simulation studies show some comparisons with respect to the mean and several recently introduced robust location measures for interval-valued data.status: publishe

    Fuzzy clustering of spatial interval-valued data

    Get PDF
    In this paper, two fuzzy clustering methods for spatial intervalvalued data are proposed, i.e. the fuzzy C-Medoids clustering of spatial interval-valued data with and without entropy regularization. Both methods are based on the Partitioning Around Medoids (PAM) algorithm, inheriting the great advantage of obtaining non-fictitious representative units for each cluster. In both methods, the units are endowed with a relation of contiguity, represented by a symmetric binary matrix. This can be intended both as contiguity in a physical space and as a more abstract notion of contiguity. The performances of the methods are proved by simulation, testing the methods with different contiguity matrices associated to natural clusters of units. In order to show the effectiveness of the methods in empirical studies, three applications are presented: the clustering of municipalities based on interval-valued pollutants levels, the clustering of European fact-checkers based on interval-valued data on the average number of impressions received by their tweets and the clustering of the residential zones of the city of Rome based on the interval of price values

    Fuzzy clustering of spatial interval-valued data

    Get PDF
    In this paper, two fuzzy clustering methods for spatial interval-valued data are proposed, i.e. the fuzzy C-Medoids clustering of spatial interval-valued data with and without entropy regularization. Both methods are based on the Partitioning Around Medoids (PAM) algorithm, inheriting the great advantage of obtaining non-fictitious representative units for each cluster. In both methods, the units are endowed with a relation of contiguity, represented by a symmetric binary matrix. This can be intended both as contiguity in a physical space and as a more abstract notion of contiguity. The performances of the methods are proved by simulation, testing the methods with different contiguity matrices associated to natural clusters of units. In order to show the effectiveness of the methods in empirical studies, three applications are presented: the clustering of municipalities based on interval-valued pollutants levels, the clustering of European fact-checkers based on interval-valued data on the average number of impressions received by their tweets and the clustering of the residential zones of the city of Rome based on the interval of price values

    Mining Extremes through Fuzzy Clustering

    Get PDF
    Archetypes are extreme points that synthesize data representing "pure" individual types. Archetypes are assigned by the most discriminating features of data points, and are almost always useful in applications when one is interested in extremes and not on commonalities. Recent applications include talent analysis in sports and science, fraud detection, profiling of users and products in recommendation systems, climate extremes, as well as other machine learning applications. The furthest-sum Archetypal Analysis (FS-AA) (Mørup and Hansen, 2012) and the Fuzzy Clustering with Proportional Membership (FCPM) (Nascimento, 2005) propose distinct models to find clusters with extreme prototypes. Even though the FCPM model does not impose its prototypes to lie in the convex hull of data, it belongs to the framework of data recovery from clustering (Mirkin, 2005), a powerful property for unsupervised cluster analysis. The baseline version of FCPM, FCPM-0, provides central prototypes whereas its smooth version, FCPM-2 provides extreme prototypes as AA archetypes. The comparative study between FS-AA and FCPM algorithms conducted in this dissertation covers the following aspects. First, the analysis of FS-AA on data recovery from clustering using a collection of 100 data sets of diverse dimensionalities, generated with a proper data generator (FCPM-DG) as well as 14 real world data. Second, testing the robustness of the clustering algorithms in the presence of outliers, with the peculiar behaviour of FCPM-0 on removing the proper number of prototypes from data. Third, a collection of five popular fuzzy validation indices are explored on accessing the quality of clustering results. Forth, the algorithms undergo a study to evaluate how different initializations affect their convergence as well as the quality of the clustering partitions. The Iterative Anomalous Pattern (IAP) algorithm allows to improve the convergence of FCPM algorithm as well as to fine-tune the level of resolution to look at clustering results, which is an advantage from FS-AA. Proper visualization functionalities for FS-AA and FCPM support the easy interpretation of the clustering results

    Trimmed fuzzy clustering for interval-valued data

    No full text
    In this paper, following a partitioning around medoids approach, a fuzzy clustering model for interval-valued data, i.e., FCMd-ID, is introduced. Successively, for avoiding the disruptive effects of possible outlier interval-valued data in the clustering process, a robust fuzzy clustering model with a trimming rule, called Trimmed Fuzzy C-medoids for interval-valued data (TrFCMd-ID), is proposed. In order to show the good performances of the robust clustering model, a simulation study and two applications are provided

    Trimmed fuzzy clustering for interval-valued data

    No full text
    In this paper, following a partitioning around medoids approach, a fuzzy clustering model for interval-valued data, i.e., FCMd-ID, is introduced. Successively, for avoiding the disruptive effects of possible outlier interval-valued data in the clustering process, a robust fuzzy clustering model with a trimming rule, called Trimmed Fuzzy C-medoids for interval-valued data (TrFCMd-ID), is proposed. In order to show the good performances of the robust clustering model, a simulation study and two applications are provided