1 research outputs found
Penalized K-Nearest-Neighbor-Graph Based Metrics for Clustering
A difficult problem in clustering is how to handle data with a manifold
structure, i.e. data that is not shaped in the form of compact clouds of
points, forming arbitrary shapes or paths embedded in a high-dimensional space.
In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based
metric, a new tool for evaluating distances in such cases. The new metric can
be used in combination with most clustering algorithms. The PKNNG metric is
based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph
of the dataset of interest using a low k-value and then it adds edges with an
exponentially penalized weight for connecting the sub-graphs produced by the
first step. We discuss several possible schemes for connecting the different
sub-graphs. We use three artificial datasets in four different embedding
situations to evaluate the behavior of the new metric, including a comparison
among different clustering methods. We also evaluate the new metric in a real
world application, clustering the MNIST digits dataset. In all cases the PKNNG
metric shows promising clustering results