7,890 research outputs found

    Self-organization and clustering algorithms

    Get PDF
    Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed

    Generalized mean for robust principal component analysis

    Get PDF
    AbstractIn this paper, we propose a robust principal component analysis (PCA) to overcome the problem that PCA is prone to outliers included in the training set. Different from the other alternatives which commonly replace L2-norm by other distance measures, the proposed method alleviates the negative effect of outliers using the characteristic of the generalized mean keeping the use of the Euclidean distance. The optimization problem based on the generalized mean is solved by a novel method. We also present a generalized sample mean, which is a generalization of the sample mean, to estimate a robust mean in the presence of outliers. The proposed method shows better or equivalent performance than the conventional PCAs in various problems such as face reconstruction, clustering, and object categorization

    A hill-sliding strategy for initialization of Gaussian clusters in the multidimensional space

    Get PDF
    A hill sliding technique was devised to extract Gaussian clusters from the multivariate probability density estimate of sample data for the first step of iterative unsupervised classification. Each cluster was assumed to posses a unimodal normal distribution. A clustering function proposed distinguished elements of a cluster under formation from the rest in the feature space. Initial clusters were extracted one by one according to the hill sliding tactics. A dimensionless cluster compactness parameter was proposed as a universal measure of cluster goodness and used satisfactorily in test runs with LANDSAT multispectral scanner data. The normalized divergence, defined by the cluster divergence divided by the entropy of the entire sample data, was utilized as a general separability measure between clusters. An overall clustering objective function was set forth in terms of cluster covariance matrices, from which the cluster compactness measure could be deduced. Minimal improvement of initial data partitioning was evaluated by this objective function in eliminating scattered sparse data points. The hill sliding clustering technique developed herein has the potential applicability to decomposition any multivariate mixture distribution into a number of unimodal distributions when an appropriate distribution function to the data set is employed

    Possibilistic and fuzzy clustering methods for robust analysis of non-precise data

    Get PDF
    This work focuses on robust clustering of data affected by imprecision. The imprecision is managed in terms of fuzzy sets. The clustering process is based on the fuzzy and possibilistic approaches. In both approaches the observations are assigned to the clusters by means of membership degrees. In fuzzy clustering the membership degrees express the degrees of sharing of the observations to the clusters. In contrast, in possibilistic clustering the membership degrees are degrees of typicality. These two sources of information are complementary because the former helps to discover the best fuzzy partition of the observations while the latter reflects how well the observations are described by the centroids and, therefore, is helpful to identify outliers. First, a fully possibilistic k-means clustering procedure is suggested. Then, in order to exploit the benefits of both the approaches, a joint possibilistic and fuzzy clustering method for fuzzy data is proposed. A selection procedure for choosing the parameters of the new clustering method is introduced. The effectiveness of the proposal is investigated by means of simulated and real-life data

    Weighted Mahalanobis Distance for Hyper-Ellipsoidal Clustering

    Get PDF
    Cluster analysis is widely used in many applications, ranging from image and speech coding to pattern recognition. A new method that uses the weighted Mahalanobis distance (WMD) via the covariance matrix of the individual clusters as the basis for grouping is presented in this thesis. In this algorithm, the Mahalanobis distance is used as a measure of similarity between the samples in each cluster. This thesis discusses some difficulties associated with using the Mahalanobis distance in clustering. The proposed method provides solutions to these problems. The new algorithm is an approximation to the well-known expectation maximization (EM) procedure used to find the maximum likelihood estimates in a Gaussian mixture model. Unlike the EM procedure, WMD eliminates the requirement of having initial parameters such as the cluster means and variances as it starts from the raw data set. Properties of the new clustering method are presented by examining the clustering quality for codebooks designed with the proposed method and competing methods on a variety of data sets. The competing methods are the Linde-Buzo-Gray (LBG) algorithm and the Fuzzy c-means (FCM) algorithm, both of them use the Euclidean distance. The neural network for hyperellipsoidal clustering (HEC) that uses the Mahalnobis distance is also studied and compared to the WMD method and the other techniques as well. The new method provides better results than the competing methods. Thus, this method becomes another useful tool for use in clustering

    Anytime Hierarchical Clustering

    Get PDF
    We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a conferenc
    • …
    corecore