4,507 research outputs found

    Detection of a sparse submatrix of a high-dimensional noisy matrix

    Full text link
    We observe a N×MN\times M matrix Yij=sij+ξijY_{ij}=s_{ij}+\xi_{ij} with ξij∼N(0,1)\xi_{ij}\sim {\mathcal {N}}(0,1) i.i.d. in i,ji,j, and sij∈Rs_{ij}\in \mathbb {R}. We test the null hypothesis sij=0s_{ij}=0 for all i,ji,j against the alternative that there exists some submatrix of size n×mn\times m with significant elements in the sense that sij≥a>0s_{ij}\ge a>0. We propose a test procedure and compute the asymptotical detection boundary aa so that the maximal testing risk tends to 0 as M→∞M\to\infty, N→∞N\to\infty, p=n/N→0p=n/N\to0, q=m/M→0q=m/M\to0. We prove that this boundary is asymptotically sharp minimax under some additional constraints. Relations with other testing problems are discussed. We propose a testing procedure which adapts to unknown (n,m)(n,m) within some given set and compute the adaptive sharp rates. The implementation of our test procedure on synthetic data shows excellent behavior for sparse, not necessarily squared matrices. We extend our sharp minimax results in different directions: first, to Gaussian matrices with unknown variance, next, to matrices of random variables having a distribution from an exponential family (non-Gaussian) and, finally, to a two-sided alternative for matrices with Gaussian elements.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ470 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

    Full text link
    We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gaussian noise. This includes clustering in a Gaussian mixture model, sparse PCA, and submatrix localization. Each of these problems is conjectured to exhibit a sharp information-theoretic threshold, below which the signal is too weak for any algorithm to detect. We derive upper and lower bounds on these thresholds by applying the first and second moment methods to the likelihood ratio between these "planted models" and null models where the signal matrix is zero. Our bounds differ by at most a factor of root two when the rank is large (in the clustering and submatrix localization problems, when the number of clusters or blocks is large) or the signal matrix is very sparse. Moreover, our upper bounds show that for each of these problems there is a significant regime where reliable detection is information- theoretically possible but where known algorithms such as PCA fail completely, since the spectrum of the observed matrix is uninformative. This regime is analogous to the conjectured 'hard but detectable' regime for community detection in sparse graphs.Comment: For sparse PCA and submatrix localization, we determine the information-theoretic threshold exactly in the limit where the number of blocks is large or the signal matrix is very sparse based on a conditional second moment method, closing the factor of root two gap in the first versio
    • …
    corecore