Adaptive Hierarchical Clustering Using Ordinal Queries

Emamjomeh-Zadeh, Ehsan; Kempe, David

research

Adaptive Hierarchical Clustering Using Ordinal Queries

Authors: Ehsan Emamjomeh-Zadeh
David Kempe
Publication date: 1 January 2018
Publisher
Doi

Abstract

In many applications of clustering (for example, ontologies or clusterings of animal or plant species), hierarchical clusterings are more descriptive than a flat clustering. A hierarchical clustering over

n

elements is represented by a rooted binary tree with

n

leaves, each corresponding to one element. The subtrees rooted at interior nodes capture the clusters. In this paper, we study active learning of a hierarchical clustering using only ordinal queries. An ordinal query consists of a set of three elements, and the response to a query reveals the two elements (among the three elements in the query) which are "closer" to each other than to the third one. We say that elements

x

and

x'

are closer to each other than

x"

if there exists a cluster containing

x

and

x'

, but not

x"

. When all the query responses are correct, there is a deterministic algorithm that learns the underlying hierarchical clustering using at most

n \log_2 n

adaptive ordinal queries. We generalize this algorithm to be robust in a model in which each query response is correct independently with probability

p > \frac{1}{2}

, and adversarially incorrect with probability

1 - p

. We show that in the presence of noise, our algorithm outputs the correct hierarchical clustering with probability at least

1 - \delta

, using

O(n \log n + n \log(1/\delta))

adaptive ordinal queries. For our results, adaptivity is crucial: we prove that even in the absence of noise, every non-adaptive algorithm requires

\Omega(n^3)

ordinal queries in the worst case.Comment: In SODA 201

Similar works

Full text

Available Versions

Crossref

info:doi/10.1137%2F1.978161197...

Last time updated on 01/04/2019