New Algorithms andMethodology for

Analysing Distances

Kettleborough, George

thesis

New Algorithms andMethodology for Analysing Distances

Authors: George Kettleborough
Publication date: 1 June 2014
Publisher

Abstract

Distances arise in a wide variety of di�erent contexts, one of which is partitional clustering, that is, the problem of �nding groups of similar objects within a set of objects.¿ese groups are seemingly very easy to �nd for humans, but very di�cult to �nd for machines as there are two major di�culties to be overcome: the �rst de�ning an objective criterion for the vague notion of “groups of similar objects”, and the second is the computational complexity of �nding such groups given a criterion. In the �rst part of this thesis, we focus on the �rst di�culty and show that even seemingly similar optimisation criteria used for partitional clustering can produce vastly di�erent results. In the process of showing this we develop a new metric for comparing clustering solutions called the assignment metric. We then prove some new NP-completeness results for problems using two related “sum-of-squares” clustering criteria. Closely related to partitional clustering is the problem of hierarchical clustering. We extend and formalise this problem to the problem of constructing rooted edge-weighted X-trees, that is trees with a leafset X. It is well known that an X-tree can be uniquely reconstructed from a distance on X if the distance is an ultrametric. But in practice the complete distance on X may not always be available. In the second part of this thesis we look at some of the circumstances under which a tree can be uniquely reconstructed from incomplete distance information. We use a concept called a lasso and give some theoretical properties of a special type of lasso. We then develop an algorithm which can construct a tree together with a lasso from partial distance information and show how this can be applied to various incomplete datasets

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

University of East Anglia digital repository

oai:ueaeprints.uea.ac.uk:53463

Last time updated on 28/06/2016