13,381 research outputs found

    The Diver\u27s Distance

    Get PDF
    The visual assessment of clustering tendency (VAT) method, which was developed by J. C. Bezdek, R. J. Hathaway and J. M. Huband uses a reordering of the rows and columns of a dissimilarity matrix; it then displays the ordered dissimilarity matrix (ODM) as a 2D gray-level image called an ordered dissimilarity image (ODI). Al- though successful in determining potential clustering structure of various data sets, the technique offers room for improvement. In this thesis, we propose a new proximity measure called the diver\u27s distance which is defined based on concepts in graph theory. We then theoretically study the diver\u27s distance and its properties. From the theoretical results, we develop an algorithm (ddVAT) to efficiently compute an ODM of diver\u27s distances; its corresponding ODI proves to be more informative than the ODI obtained from VAT. Moreover, ddVAT turns out to be very efficient with linear clusters and very useful in cases where there is difficulty to satisfactorily represent cluster point representatives

    Kernel Metric Learning for Clustering Mixed-type Data

    Full text link
    Distance-based clustering and classification are widely used in various fields to group mixed numeric and categorical data. A predefined distance measurement is used to cluster data points based on their dissimilarity. While there exist numerous distance-based measures for data with pure numerical attributes and several ordered and unordered categorical metrics, an optimal distance for mixed-type data is an open problem. Many metrics convert numerical attributes to categorical ones or vice versa. They handle the data points as a single attribute type or calculate a distance between each attribute separately and add them up. We propose a metric that uses mixed kernels to measure dissimilarity, with cross-validated optimal kernel bandwidths. Our approach improves clustering accuracy when utilized for existing distance-based clustering algorithms on simulated and real-world datasets containing pure continuous, categorical, and mixed-type data.Comment: 23 pages, 5 tables, 2 figure

    On morphological hierarchical representations for image processing and spatial data clustering

    Full text link
    Hierarchical data representations in the context of classi cation and data clustering were put forward during the fties. Recently, hierarchical image representations have gained renewed interest for segmentation purposes. In this paper, we briefly survey fundamental results on hierarchical clustering and then detail recent paradigms developed for the hierarchical representation of images in the framework of mathematical morphology: constrained connectivity and ultrametric watersheds. Constrained connectivity can be viewed as a way to constrain an initial hierarchy in such a way that a set of desired constraints are satis ed. The framework of ultrametric watersheds provides a generic scheme for computing any hierarchical connected clustering, in particular when such a hierarchy is constrained. The suitability of this framework for solving practical problems is illustrated with applications in remote sensing

    Anytime Hierarchical Clustering

    Get PDF
    We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a conferenc

    Fast Algorithm and Implementation of Dissimilarity Self-Organizing Maps

    Get PDF
    In many real world applications, data cannot be accurately represented by vectors. In those situations, one possible solution is to rely on dissimilarity measures that enable sensible comparison between observations. Kohonen's Self-Organizing Map (SOM) has been adapted to data described only through their dissimilarity matrix. This algorithm provides both non linear projection and clustering of non vector data. Unfortunately, the algorithm suffers from a high cost that makes it quite difficult to use with voluminous data sets. In this paper, we propose a new algorithm that provides an important reduction of the theoretical cost of the dissimilarity SOM without changing its outcome (the results are exactly the same as the ones obtained with the original algorithm). Moreover, we introduce implementation methods that result in very short running times. Improvements deduced from the theoretical cost model are validated on simulated and real world data (a word list clustering problem). We also demonstrate that the proposed implementation methods reduce by a factor up to 3 the running time of the fast algorithm over a standard implementation
    • …
    corecore