Motivated by many applications, we study clustering with a faulty oracle. In
this problem, there are n items belonging to k unknown clusters, and the
algorithm is allowed to ask the oracle whether two items belong to the same
cluster or not. However, the answer from the oracle is correct only with
probability 21+2δ. The goal is to recover the hidden
clusters with minimum number of noisy queries. Previous works have shown that
the problem can be solved with O(δ2nklogn+poly(k,δ1,logn)) queries, while
Ω(δ2nk) queries is known to be necessary. So, for any
values of k and δ, there is still a non-trivial gap between upper and
lower bounds. In this work, we obtain the first matching upper and lower bounds
for a wide range of parameters. In particular, a new polynomial time algorithm
with O(δ2n(k+logn)+poly(k,δ1,logn)) queries is proposed. Moreover, we prove a new lower bound of
Ω(δ2nlogn), which, combined with the existing
Ω(δ2nk) bound, matches our upper bound up to an additive
poly(k,δ1,logn) term. To obtain the new results, our
main ingredient is an interesting connection between our problem and
multi-armed bandit, which might provide useful insights for other similar
problems.Comment: ICML 202