Hierarchical clustering based on pairwise similarities is a common tool used
in a broad range of scientific applications. However, in many problems it may
be expensive to obtain or compute similarities between the items to be
clustered. This paper investigates the hierarchical clustering of N items based
on a small subset of pairwise similarities, significantly less than the
complete set of N(N-1)/2 similarities. First, we show that if the intracluster
similarities exceed intercluster similarities, then it is possible to correctly
determine the hierarchical clustering from as few as 3N log N similarities. We
demonstrate this order of magnitude savings in the number of pairwise
similarities necessitates sequentially selecting which similarities to obtain
in an adaptive fashion, rather than picking them at random. We then propose an
active clustering method that is robust to a limited fraction of anomalous
similarities, and show how even in the presence of these noisy similarity
values we can resolve the hierarchical clustering using only O(N log^2 N)
pairwise similarities