Search CORE

3,677 research outputs found

Optimal linear estimation under unknown nonlinear transform

Author: Caramanis Constantine
Liu Han
Wang Zhaoran
Yi Xinyang
Publication venue
Publication date: 01/01/2015
Field of study

Linear regression studies the problem of estimating a model parameter

\beta^* \in \mathbb{R}^p

, from

n

observations

\{(y_i,\mathbf{x}_i)\}_{i=1}^n

from linear model

y_i = \langle \mathbf{x}_i,\beta^* \rangle + \epsilon_i

. We consider a significant generalization in which the relationship between

\langle \mathbf{x}_i,\beta^* \rangle

and

y_i

is noisy, quantized to a single bit, potentially nonlinear, noninvertible, as well as unknown. This model is known as the single-index model in statistics, and, among other things, it represents a significant generalization of one-bit compressed sensing. We propose a novel spectral-based estimation procedure and show that we can recover

\beta^*

in settings (i.e., classes of link function

f

) where previous algorithms fail. In general, our algorithm requires only very mild restrictions on the (unknown) functional relationship between

y_i

and

\langle \mathbf{x}_i,\beta^* \rangle

. We also consider the high dimensional setting where

\beta^*

is sparse ,and introduce a two-stage nonconvex framework that addresses estimation challenges in high dimensional regimes where

p \gg n

. For a broad class of link functions between

\langle \mathbf{x}_i,\beta^* \rangle

and

y_i

, we establish minimax lower bounds that demonstrate the optimality of our estimators in both the classical and high dimensional regimes.Comment: 25 pages, 3 figure

arXiv.org e-Print Archive

Princeton University Open Access Repository

PubMed Central

Model Assisted Variable Clustering: Minimax-optimal Recovery and Algorithms

Author: Bunea Florentina
Giraud Christophe
Luo Xi
Royer Martin
Verzelen Nicolas
Publication venue
Publication date: 18/04/2018
Field of study

Model-based clustering defines population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of G-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a G-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to G-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular K-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we contrast our methods with another popular clustering method, spectral clustering, specialized to variable clustering, and show that ensuring exact cluster recovery via this method requires clusters to have a higher separation, relative to the minimax threshold. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.Comment: Maintext: 38 pages; supplementary information: 37 page

arXiv.org e-Print Archive