Search CORE

3,416 research outputs found

Fast redshift clustering with the Baire (ultra) metric

Author: Contreras Pedro
Murtagh Fionn
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 20/04/2011
Field of study

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more cos\ tly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we develop a clusterwise nearest neighbor regression procedure for this.Comment: 14 pages, 6 figure

arXiv.org e-Print Archive

Crossref

On morphological hierarchical representations for image processing and spatial data clustering

Author: A. Baraldi
A. Rosenfeld
C. Jardine
C. Mattiussi
C. Ronse
C. Zahn
D. Wishart
E. Breen
F. Dias
F. Meyer
F. Meyer
F. Meyer
G. Bertrand
G. Estabrook
G. Matheron
G. Ouzounis
J. Cousty
J. Cousty
J. Cousty
J. Cousty
J. Cousty
J. Gower
J. Kruskal
J. Serra
J. Shi
J.P. Barthélemy
J.P. Benzécri
K. Florek
K. Spärck Jones
L. Gueguen
L. Guigues
L. Guigues
L. Hubert
L. Hubert
L. Hubert
L. Najman
L. Najman
L. Najman
L. Najman
L. Vincent
M. Nagao
M. Nagao
N. Ahuja
N. Jardine
N. Jardine
N. Jardine
O. Morris
P. Arbeláez
P. Felzenszwalb
P. Nacken
P. Salembier
P. Salembier
P. Salembier
P. Sneath
P. Soille
P. Soille
P. Soille
P. Soille
P. Soille
P. Soille
P. Soille
R. Adams
R. Cormack
R. Graham
R. Jones
R. Levillain
R. Marfil
R. Sokal
S. Beucher
S. Horowitz
S. Johnson
S. Zucker
T. Kong
T. Sørensen
W.G. Kropatsch
Z. Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Hierarchical data representations in the context of classi cation and data clustering were put forward during the fties. Recently, hierarchical image representations have gained renewed interest for segmentation purposes. In this paper, we briefly survey fundamental results on hierarchical clustering and then detail recent paradigms developed for the hierarchical representation of images in the framework of mathematical morphology: constrained connectivity and ultrametric watersheds. Constrained connectivity can be viewed as a way to constrain an initial hierarchy in such a way that a set of desired constraints are satis ed. The framework of ultrametric watersheds provides a generic scheme for computing any hierarchical connected clustering, in particular when such a hierarchy is constrained. The suitability of this framework for solving practical problems is illustrated with applications in remote sensing

arXiv.org e-Print Archive

JRC Publications Repository

Crossref

Anytime Hierarchical Clustering

Author: Arslan Omur
Koditschek Daniel E.
Publication venue
Publication date: 13/04/2014
Field of study

We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a conferenc

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Fast, Linear Time Hierarchical Clustering using the Baire Metric

Author: A Fernández-Soto
ACM Van Rooij
AK Seda
BA Davey
F Murtagh
F Murtagh
F Murtagh
F Murtagh
F Murtagh
F Murtagh
F Murtagh
Fionn Murtagh
IC Lerman
J-P Benzécri
JA Hartigan
JK Adelman-Mccarthy
MF Janowitz
MF Janowitz
P Contreras
P Hitzler
PE Bradley
PE Bradley
Pedro Contreras
R D’abrusco
SC Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/06/2011
Field of study

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.Comment: 27 pages, 6 tables, 10 figure

arXiv.org e-Print Archive

Royal Holloway Research Online

Crossref

Royal Holloway - Pure

De Montfort University Open Research Archive

Analyzing and Visualizing State Sequences in R with TraMineR

Author: Alexis Gabadinho
Gilbert Ritschard
Matthias Studer
Nicolas S Müller
Publication venue
Publication date
Field of study

This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state se- quence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineRÃ¢ÂÂs outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data.

Research Papers in Economics

Deep Unsupervised Similarity Learning using Partially Ordered Sets

Author: Bautista Miguel A
Ommer Björn
Sanakoyeu Artsiom
Publication venue
Publication date: 11/04/2017
Field of study

Unsupervised learning of visual similarities is of paramount importance to computer vision, particularly due to lacking training data for fine-grained similarities. Deep learning of similarities is often based on relationships between pairs or triplets of samples. Many of these relations are unreliable and mutually contradicting, implying inconsistencies when trained without supervision information that relates different tuples or triplets to each other. To overcome this problem, we use local estimates of reliable (dis-)similarities to initially group samples into compact surrogate classes and use local partial orders of samples to classes to link classes to each other. Similarity learning is then formulated as a partial ordering task with soft correspondences of all samples to classes. Adopting a strategy of self-supervision, a CNN is trained to optimally represent samples in a mutually consistent manner while updating the classes. The similarity learning and grouping procedure are integrated in a single model and optimized jointly. The proposed unsupervised approach shows competitive performance on detailed pose estimation and object classification.Comment: Accepted for publication at IEEE Computer Vision and Pattern Recognition 201

arXiv.org e-Print Archive

Crossref

New approaches for clustering high dimensional data

Author: Liu Jinze
Publication venue
Publication date: 01/12/2006
Field of study

Clustering is one of the most effective methods for analyzing datasets that contain a large number of objects with numerous attributes. Clustering seeks to identify groups, or clusters, of similar objects. In low dimensional space, the similarity between objects is often evaluated by summing the difference across all of their attributes. High dimensional data, however, may contain irrelevant attributes which mask the existence of clusters. The discovery of groups of objects that are highly similar within some subsets of relevant attributes becomes an important but challenging task. My thesis focuses on various models and algorithms for this task. We first present a flexible clustering model, namely OP-Cluster (Order Preserving Cluster). Under this model, two objects are similar on a subset of attributes if the values of these two objects induce the same relative ordering of these attributes. OPClustering algorithm has demonstrated to be useful to identify co-regulated genes in gene expression data. We also propose a semi-supervised approach to discover biologically meaningful OP-Clusters by incorporating existing gene function classifications into the clustering process. This semi-supervised algorithm yields only OP-clusters that are significantly enriched by genes from specific functional categories. Real datasets are often noisy. We propose a noise-tolerant clustering algorithm for mining frequently occuring itemsets. This algorithm is called approximate frequent itemsets (AFI). Both the theoretical and experimental results demonstrate that our AFI mining algorithm has higher recoverability of real clusters than any other existing itemset mining approaches. Pair-wise dissimilarities are often derived from original data to reduce the complexities of high dimensional data. Traditional clustering algorithms taking pair-wise dissimilarities as input often generate disjoint clusters from pair-wise dissimilarities. It is well known that the classification model represented by disjoint clusters is inconsistent with many real classifications, such gene function classifications. We develop a Poclustering algorithm, which generates overlapping clusters from pair-wise dissimilarities. We prove that by allowing overlapping clusters, Poclustering fully preserves the information of any dissimilarity matrices while traditional partitioning algorithms may cause significant information loss

Carolina Digital Repository