62 research outputs found
Degenerating families of dendrograms
Dendrograms used in data analysis are ultrametric spaces, hence objects of
nonarchimedean geometry. It is known that there exist -adic representation
of dendrograms. Completed by a point at infinity, they can be viewed as
subtrees of the Bruhat-Tits tree associated to the -adic projective line.
The implications are that certain moduli spaces known in algebraic geometry are
-adic parameter spaces of (families of) dendrograms, and stochastic
classification can also be handled within this framework. At the end, we
calculate the topology of the hidden part of a dendrogram.Comment: 13 pages, 8 figure
Mumford dendrograms and discrete p-adic symmetries
In this article, we present an effective encoding of dendrograms by embedding
them into the Bruhat-Tits trees associated to -adic number fields. As an
application, we show how strings over a finite alphabet can be encoded in
cyclotomic extensions of and discuss -adic DNA encoding. The
application leads to fast -adic agglomerative hierarchic algorithms similar
to the ones recently used e.g. by A. Khrennikov and others. From the viewpoint
of -adic geometry, to encode a dendrogram in a -adic field means
to fix a set of -rational punctures on the -adic projective line
. To is associated in a natural way a
subtree inside the Bruhat-Tits tree which recovers , a method first used by
F. Kato in 1999 in the classification of discrete subgroups of
.
Next, we show how the -adic moduli space of
with punctures can be applied to the study of time series of
dendrograms and those symmetries arising from hyperbolic actions on
. In this way, we can associate to certain classes of dynamical
systems a Mumford curve, i.e. a -adic algebraic curve with totally
degenerate reduction modulo .
Finally, we indicate some of our results in the study of general discrete
actions on , and their relation to -adic Hurwitz spaces.Comment: 14 pages, 6 figure
Fast redshift clustering with the Baire (ultra) metric
The Baire metric induces an ultrametric on a dataset and is of linear
computational complexity, contrasted with the standard quadratic time
agglomerative hierarchical clustering algorithm. We apply the Baire distance to
spectrometric and photometric redshifts from the Sloan Digital Sky Survey
using, in this work, about half a million astronomical objects. We want to know
how well the (more cos\ tly to determine) spectrometric redshifts can predict
the (more easily obtained) photometric redshifts, i.e. we seek to regress the
spectrometric on the photometric redshifts, and we develop a clusterwise
nearest neighbor regression procedure for this.Comment: 14 pages, 6 figure
A -adic RanSaC algorithm for stereo vision using Hensel lifting
A -adic variation of the Ran(dom) Sa(mple) C(onsensus) method for solving
the relative pose problem in stereo vision is developped. From two 2-adically
encoded images a random sample of five pairs of corresponding points is taken,
and the equations for the essential matrix are solved by lifting solutions
modulo 2 to the 2-adic integers. A recently devised -adic hierarchical
classification algorithm imitating the known LBG quantisation method classifies
the solutions for all the samples after having determined the number of
clusters using the known intra-inter validity of clusterings. In the successful
case, a cluster ranking will determine the cluster containing a 2-adic
approximation to the "true" solution of the problem.Comment: 15 pages; typos removed, abstract changed, computation error remove
Fast, Linear Time Hierarchical Clustering using the Baire Metric
The Baire metric induces an ultrametric on a dataset and is of linear
computational complexity, contrasted with the standard quadratic time
agglomerative hierarchical clustering algorithm. In this work we evaluate
empirically this new approach to hierarchical clustering. We compare
hierarchical clustering based on the Baire metric with (i) agglomerative
hierarchical clustering, in terms of algorithm properties; (ii) generalized
ultrametrics, in terms of definition; and (iii) fast clustering through k-means
partititioning, in terms of quality of results. For the latter, we carry out an
in depth astronomical study. We apply the Baire distance to spectrometric and
photometric redshifts from the Sloan Digital Sky Survey using, in this work,
about half a million astronomical objects. We want to know how well the (more
costly to determine) spectrometric redshifts can predict the (more easily
obtained) photometric redshifts, i.e. we seek to regress the spectrometric on
the photometric redshifts, and we use clusterwise regression for this.Comment: 27 pages, 6 tables, 10 figure
- …