Search CORE

13,997 research outputs found

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

Author: Banks Jess
Moore Cristopher
Vershynin Roman
Verzelen Nicolas
Xu Jiaming
Publication venue
Publication date: 23/01/2017
Field of study

We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gaussian noise. This includes clustering in a Gaussian mixture model, sparse PCA, and submatrix localization. Each of these problems is conjectured to exhibit a sharp information-theoretic threshold, below which the signal is too weak for any algorithm to detect. We derive upper and lower bounds on these thresholds by applying the first and second moment methods to the likelihood ratio between these "planted models" and null models where the signal matrix is zero. Our bounds differ by at most a factor of root two when the rank is large (in the clustering and submatrix localization problems, when the number of clusters or blocks is large) or the signal matrix is very sparse. Moreover, our upper bounds show that for each of these problems there is a significant regime where reliable detection is information- theoretically possible but where known algorithms such as PCA fail completely, since the spectrum of the observed matrix is uninformative. This regime is analogous to the conjectured 'hard but detectable' regime for community detection in sparse graphs.Comment: For sparse PCA and submatrix localization, we determine the information-theoretic threshold exactly in the limit where the number of blocks is large or the signal matrix is very sparse based on a conditional second moment method, closing the factor of root two gap in the first versio

arXiv.org e-Print Archive

Crossref

HAL Descartes

eScholarship - University of California

Hal-Diderot

Detection and Feature Selection in Sparse Mixture Models

Author: Arias-Castro Ery
Verzelen Nicolas
Publication venue
Publication date: 01/10/2016
Field of study

We consider Gaussian mixture models in high dimensions and concentrate on the twin tasks of detection and feature selection. Under sparsity assumptions on the difference in means, we derive information bounds and establish the performance of various procedures, including the top sparse eigenvalue of the sample covariance matrix and other projection tests based on moments, such as the skewness and kurtosis tests of Malkovich and Afifi (1973), and other variants which we were better able to control under the null.Comment: 70 page

arXiv.org e-Print Archive

CiteSeerX

HAL Descartes

Hal-Diderot

Block-diagonal covariance selection for high-dimensional Gaussian graphical models

Author: Devijver Emilie
Gallopin Mélina
Publication venue
Publication date: 11/11/2015
Field of study

Gaussian graphical models are widely utilized to infer and visualize networks of dependencies between continuous variables. However, inferring the graph is difficult when the sample size is small compared to the number of variables. To reduce the number of parameters to estimate in the model, we propose a non-asymptotic model selection procedure supported by strong theoretical guarantees based on an oracle inequality and a minimax lower bound. The covariance matrix of the model is approximated by a block-diagonal matrix. The structure of this matrix is detected by thresholding the sample covariance matrix, where the threshold is selected using the slope heuristic. Based on the block-diagonal structure of the covariance matrix, the estimation problem is divided into several independent problems: subsequently, the network of dependencies between variables is inferred using the graphical lasso algorithm in each block. The performance of the procedure is illustrated on simulated data. An application to a real gene expression dataset with a limited sample size is also presented: the dimension reduction allows attention to be objectively focused on interactions among smaller subsets of genes, leading to a more parsimonious and interpretable modular network.Comment: Accepted in JAS

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

FigShare