1,541 research outputs found

    Fuzzy clustering with Minkowski distance

    Get PDF
    Distances in the well known fuzzy c-means algorithm of Bezdek (1973) are measured by the squared Euclidean distance.Other distances have been used as well in fuzzy clustering. For example, Jajuga (1991) proposed to use the L_1-distance and Bobrowski and Bezdek (1991) also used the L_infty-distance. For the more general case of Minkowski distance and the case of using a root of the squared Minkowski distance, Groenen and Jajuga (2001) introduced a majorization algorithm to minimize the error. One of the advantages of iterative majorization is that it is a guaranteed descent algorithm, so that every iteration reduces the error until convergence is reached.However, their algorithm was limited to the case of Minkowski parameter between 1 and 2, that is, between the L_1-distance and the Euclidean distance. Here, we extend their majorization algorithm to any Minkowski distance with Minkowski parameter greater than (or equal to) 1. This extension also includes the case of the L_infty-distance. We also investigate how well this algorithm performs and present an empirical application.

    Self-organization and clustering algorithms

    Get PDF
    Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed

    Distances in evidence theory: Comprehensive survey and generalizations

    Get PDF
    AbstractThe purpose of the present work is to survey the dissimilarity measures defined so far in the mathematical framework of evidence theory, and to propose a classification of these measures based on their formal properties. This research is motivated by the fact that while dissimilarity measures have been widely studied and surveyed in the fields of probability theory and fuzzy set theory, no comprehensive survey is yet available for evidence theory. The main results presented herein include a synthesis of the properties of the measures defined so far in the scientific literature; the generalizations proposed naturally lead to additions to the body of the previously known measures, leading to the definition of numerous new measures. Building on this analysis, we have highlighted the fact that Dempster’s conflict cannot be considered as a genuine dissimilarity measure between two belief functions and have proposed an alternative based on a cosine function. Other original results include the justification of the use of two-dimensional indexes as (cosine; distance) couples and a general formulation for this class of new indexes. We base our exposition on a geometrical interpretation of evidence theory and show that most of the dissimilarity measures so far published are based on inner products, in some cases degenerated. Experimental results based on Monte Carlo simulations illustrate interesting relationships between existing measures

    Fuzzy clustering with Minkowski distance

    Get PDF
    Distances in the well known fuzzy c-means algorithm of Bezdek (1973) are measured by the squared Euclidean distance. Other distances have been used as well in fuzzy clustering. For example, Jajuga (1991) proposed to use the L_1-distance and Bobrowski and Bezdek (1991) also used the L_infty-distance. For the more general case of Minkowski distance and the case of using a root of the squared Minkowski distance, Groenen and Jajuga (2001) introduced a majorization algorithm to minimize the error. One of the advantages of iterative majorization is that it is a guaranteed descent algorithm, so that every iteration reduces the error until convergence is reached. However, their algorithm was limited to the case of Minkowski parameter between 1 and 2, that is, between the L_1-distance and the Euclidean distance. Here, we extend their majorization algorithm to any Minkowski distance with Minkowski parameter greater than (or equal to) 1. This extension also includes the case of the L_infty-distance. We also investigate how well this algorithm performs and present an empirical application

    Assessing the Number of Components in Mixture Models: a Review.

    Get PDF
    Despite the widespread application of finite mixture models, the decision of how many classes are required to adequately represent the data is, according to many authors, an important, but unsolved issue. This work aims to review, describe and organize the available approaches designed to help the selection of the adequate number of mixture components (including Monte Carlo test procedures, information criteria and classification-based criteria); we also provide some published simulation results about their relative performance, with the purpose of identifying the scenarios where each criterion is more effective (adequate).Finite mixture; number of mixture components; information criteria; simulation studies.

    Distribution-based entropy weighting clustering of skewed and heavy tailed time series

    Get PDF
    The goal of clustering is to identify common structures in a data set by forming groups of homogeneous objects. The observed characteristics of many economic time series motivated the development of classes of distributions that can accommodate properties, such as heavy tails and skewness. Thanks to its flexibility, the skewed exponential power distribution (also called skewed generalized error distribution) ensures a unified and general framework for clustering possibly skewed and heavy tailed time series. This paper develops a clustering procedure of model-based type, assuming that the time series are generated by the same underlying probability distribution but with different parameters. Moreover, we propose to optimally combine the estimated parameters to form the clusters with an entropy weighing k-means approach. The usefulness of the proposal is shown by means of application to financial time series, demonstrating also how the obtained clusters can be used to form portfolio of stocks.Peer ReviewedPostprint (published version
    • 

    corecore