Rank-based linkage I: triplet comparisons and oriented simplicial complexes

Abstract

Rank-based linkage is a new tool for summarizing a collection SS of objects according to their relationships. These objects are not mapped to vectors, and ``similarity'' between objects need be neither numerical nor symmetrical. All an object needs to do is rank nearby objects by similarity to itself, using a Comparator which is transitive, but need not be consistent with any metric on the whole set. Call this a ranking system on SS. Rank-based linkage is applied to the KK-nearest neighbor digraph derived from a ranking system. Computations occur on a 2-dimensional abstract oriented simplicial complex whose faces are among the points, edges, and triangles of the line graph of the undirected KK-nearest neighbor graph on SS. In ∣S∣K2|S| K^2 steps it builds an edge-weighted linkage graph (S,L,Οƒ)(S, \mathcal{L}, \sigma) where Οƒ({x,y})\sigma(\{x, y\}) is called the in-sway between objects xx and yy. Take Lt\mathcal{L}_t to be the links whose in-sway is at least tt, and partition SS into components of the graph (S,Lt)(S, \mathcal{L}_t), for varying tt. Rank-based linkage is a functor from a category of out-ordered digraphs to a category of partitioned sets, with the practical consequence that augmenting the set of objects in a rank-respectful way gives a fresh clustering which does not ``rip apart`` the previous one. The same holds for single linkage clustering in the metric space context, but not for typical optimization-based methods. Open combinatorial problems are presented in the last section.Comment: 37 pages, 12 figure

    Similar works

    Full text

    thumbnail-image

    Available Versions