16,460 research outputs found
Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition
We provide efficient constant factor approximation algorithms for the
problems of finding a hierarchical clustering of a point set in any metric
space, minimizing the sum of minimimum spanning tree lengths within each
cluster, and in the hyperbolic or Euclidean planes, minimizing the sum of
cluster perimeters. Our algorithms for the hyperbolic and Euclidean planes can
also be used to provide a pants decomposition, that is, a set of disjoint
simple closed curves partitioning the plane minus the input points into subsets
with exactly three boundary components, with approximately minimum total
length. In the Euclidean case, these curves are squares; in the hyperbolic
case, they combine our Euclidean square pants decomposition with our tree
clustering method for general metric spaces.Comment: 22 pages, 14 figures. This version replaces the proof of what is now
Lemma 5.2, as the previous proof was erroneou
Clustering with shallow trees
We propose a new method for hierarchical clustering based on the optimisation
of a cost function over trees of limited depth, and we derive a
message--passing method that allows to solve it efficiently. The method and
algorithm can be interpreted as a natural interpolation between two well-known
approaches, namely single linkage and the recently presented Affinity
Propagation. We analyze with this general scheme three biological/medical
structured datasets (human population based on genetic information, proteins
based on sequences and verbal autopsies) and show that the interpolation
technique provides new insight.Comment: 11 pages, 7 figure
Multivariate Approaches to Classification in Extragalactic Astronomy
Clustering objects into synthetic groups is a natural activity of any
science. Astrophysics is not an exception and is now facing a deluge of data.
For galaxies, the one-century old Hubble classification and the Hubble tuning
fork are still largely in use, together with numerous mono-or bivariate
classifications most often made by eye. However, a classification must be
driven by the data, and sophisticated multivariate statistical tools are used
more and more often. In this paper we review these different approaches in
order to situate them in the general context of unsupervised and supervised
learning. We insist on the astrophysical outcomes of these studies to show that
multivariate analyses provide an obvious path toward a renewal of our
classification of galaxies and are invaluable tools to investigate the physics
and evolution of galaxies.Comment: Open Access paper.
http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>.
\<10.3389/fspas.2015.00003 \&g
- …