We consider the question of Gaussian mean testing, a fundamental task in
high-dimensional distribution testing and signal processing, subject to
adversarial corruptions of the samples. We focus on the relative power of
different adversaries, and show that, in contrast to the common wisdom in
robust statistics, there exists a strict separation between adaptive
adversaries (strong contamination) and oblivious ones (weak contamination) for
this task. Specifically, we resolve both the information-theoretic and
computational landscapes for robust mean testing. In the exponential-time
setting, we establish the tight sample complexity of testing N(0,I)
against N(αv,I), where ∥v∥2=1, with an
ε-fraction of adversarial corruptions, to be Θ~(max(α2d,α4dε3,min(α8/3d2/3ε2/3,α2dε))), while the
complexity against adaptive adversaries is Θ~(max(α2d,α4dε2)), which is strictly worse
for a large range of vanishing ε,α. To the best of our
knowledge, ours is the first separation in sample complexity between the strong
and weak contamination models.
In the polynomial-time setting, we close a gap in the literature by providing
a polynomial-time algorithm against adaptive adversaries achieving the above
sample complexity Θ~(max(d/α2,dε2/α4)), and a low-degree lower bound (which
complements an existing reduction from planted clique) suggesting that all
efficient algorithms require this many samples, even in the oblivious-adversary
setting.Comment: To appear in FOCS 202