4,507 research outputs found
Detection of a sparse submatrix of a high-dimensional noisy matrix
We observe a matrix with i.i.d. in , and . We test the
null hypothesis for all against the alternative that there
exists some submatrix of size with significant elements in the
sense that . We propose a test procedure and compute the
asymptotical detection boundary so that the maximal testing risk tends to 0
as , , , . We prove that this
boundary is asymptotically sharp minimax under some additional constraints.
Relations with other testing problems are discussed. We propose a testing
procedure which adapts to unknown within some given set and compute the
adaptive sharp rates. The implementation of our test procedure on synthetic
data shows excellent behavior for sparse, not necessarily squared matrices. We
extend our sharp minimax results in different directions: first, to Gaussian
matrices with unknown variance, next, to matrices of random variables having a
distribution from an exponential family (non-Gaussian) and, finally, to a
two-sided alternative for matrices with Gaussian elements.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ470 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization
We study the problem of detecting a structured, low-rank signal matrix
corrupted with additive Gaussian noise. This includes clustering in a Gaussian
mixture model, sparse PCA, and submatrix localization. Each of these problems
is conjectured to exhibit a sharp information-theoretic threshold, below which
the signal is too weak for any algorithm to detect. We derive upper and lower
bounds on these thresholds by applying the first and second moment methods to
the likelihood ratio between these "planted models" and null models where the
signal matrix is zero. Our bounds differ by at most a factor of root two when
the rank is large (in the clustering and submatrix localization problems, when
the number of clusters or blocks is large) or the signal matrix is very sparse.
Moreover, our upper bounds show that for each of these problems there is a
significant regime where reliable detection is information- theoretically
possible but where known algorithms such as PCA fail completely, since the
spectrum of the observed matrix is uninformative. This regime is analogous to
the conjectured 'hard but detectable' regime for community detection in sparse
graphs.Comment: For sparse PCA and submatrix localization, we determine the
information-theoretic threshold exactly in the limit where the number of
blocks is large or the signal matrix is very sparse based on a conditional
second moment method, closing the factor of root two gap in the first versio
- …