Search CORE

4,507 research outputs found

Detection of a sparse submatrix of a high-dimensional noisy matrix

Author: Butucea Cristina
Ingster Yuri I.
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/01/2011
Field of study

We observe a

N\times M

matrix

Y_{ij}=s_{ij}+\xi_{ij}

with

\xi_{ij}\sim {\mathcal {N}}(0,1)

i.i.d. in

i,j

, and

s_{ij}\in \mathbb {R}

. We test the null hypothesis

s_{ij}=0

for all

i,j

against the alternative that there exists some submatrix of size

n\times m

with significant elements in the sense that

s_{ij}\ge a>0

. We propose a test procedure and compute the asymptotical detection boundary

a

so that the maximal testing risk tends to 0 as

M\to\infty

N\to\infty

p=n/N\to0

q=m/M\to0

. We prove that this boundary is asymptotically sharp minimax under some additional constraints. Relations with other testing problems are discussed. We propose a testing procedure which adapts to unknown

(n,m)

within some given set and compute the adaptive sharp rates. The implementation of our test procedure on synthetic data shows excellent behavior for sparse, not necessarily squared matrices. We extend our sharp minimax results in different directions: first, to Gaussian matrices with unknown variance, next, to matrices of random variables having a distribution from an exponential family (non-Gaussian) and, finally, to a two-sided alternative for matrices with Gaussian elements.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ470 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL - UPEC / UPEM

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

Author: Banks Jess
Moore Cristopher
Vershynin Roman
Verzelen Nicolas
Xu Jiaming
Publication venue
Publication date: 23/01/2017
Field of study

We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gaussian noise. This includes clustering in a Gaussian mixture model, sparse PCA, and submatrix localization. Each of these problems is conjectured to exhibit a sharp information-theoretic threshold, below which the signal is too weak for any algorithm to detect. We derive upper and lower bounds on these thresholds by applying the first and second moment methods to the likelihood ratio between these "planted models" and null models where the signal matrix is zero. Our bounds differ by at most a factor of root two when the rank is large (in the clustering and submatrix localization problems, when the number of clusters or blocks is large) or the signal matrix is very sparse. Moreover, our upper bounds show that for each of these problems there is a significant regime where reliable detection is information- theoretically possible but where known algorithms such as PCA fail completely, since the spectrum of the observed matrix is uninformative. This regime is analogous to the conjectured 'hard but detectable' regime for community detection in sparse graphs.Comment: For sparse PCA and submatrix localization, we determine the information-theoretic threshold exactly in the limit where the number of blocks is large or the signal matrix is very sparse based on a conditional second moment method, closing the factor of root two gap in the first versio

arXiv.org e-Print Archive

Crossref

HAL Descartes

eScholarship - University of California

Hal-Diderot