research

Adaptive Hierarchical Clustering Using Ordinal Queries

Abstract

In many applications of clustering (for example, ontologies or clusterings of animal or plant species), hierarchical clusterings are more descriptive than a flat clustering. A hierarchical clustering over nn elements is represented by a rooted binary tree with nn leaves, each corresponding to one element. The subtrees rooted at interior nodes capture the clusters. In this paper, we study active learning of a hierarchical clustering using only ordinal queries. An ordinal query consists of a set of three elements, and the response to a query reveals the two elements (among the three elements in the query) which are "closer" to each other than to the third one. We say that elements xx and xx' are closer to each other than x"x" if there exists a cluster containing xx and xx', but not x"x". When all the query responses are correct, there is a deterministic algorithm that learns the underlying hierarchical clustering using at most nlog2nn \log_2 n adaptive ordinal queries. We generalize this algorithm to be robust in a model in which each query response is correct independently with probability p>12p > \frac{1}{2}, and adversarially incorrect with probability 1p1 - p. We show that in the presence of noise, our algorithm outputs the correct hierarchical clustering with probability at least 1δ1 - \delta, using O(nlogn+nlog(1/δ))O(n \log n + n \log(1/\delta)) adaptive ordinal queries. For our results, adaptivity is crucial: we prove that even in the absence of noise, every non-adaptive algorithm requires Ω(n3)\Omega(n^3) ordinal queries in the worst case.Comment: In SODA 201

    Similar works

    Full text

    thumbnail-image

    Available Versions