In many applications of clustering (for example, ontologies or clusterings of
animal or plant species), hierarchical clusterings are more descriptive than a
flat clustering. A hierarchical clustering over n elements is represented by
a rooted binary tree with n leaves, each corresponding to one element. The
subtrees rooted at interior nodes capture the clusters. In this paper, we study
active learning of a hierarchical clustering using only ordinal queries. An
ordinal query consists of a set of three elements, and the response to a query
reveals the two elements (among the three elements in the query) which are
"closer" to each other than to the third one. We say that elements x and x′
are closer to each other than x" if there exists a cluster containing x and
x′, but not x".
When all the query responses are correct, there is a deterministic algorithm
that learns the underlying hierarchical clustering using at most nlog2n
adaptive ordinal queries. We generalize this algorithm to be robust in a model
in which each query response is correct independently with probability p>21, and adversarially incorrect with probability 1−p. We show
that in the presence of noise, our algorithm outputs the correct hierarchical
clustering with probability at least 1−δ, using O(nlogn+nlog(1/δ)) adaptive ordinal queries. For our results, adaptivity is
crucial: we prove that even in the absence of noise, every non-adaptive
algorithm requires Ω(n3) ordinal queries in the worst case.Comment: In SODA 201