3,677 research outputs found
Optimal linear estimation under unknown nonlinear transform
Linear regression studies the problem of estimating a model parameter
, from observations
from linear model . We consider a significant
generalization in which the relationship between and is noisy, quantized to a single bit, potentially nonlinear,
noninvertible, as well as unknown. This model is known as the single-index
model in statistics, and, among other things, it represents a significant
generalization of one-bit compressed sensing. We propose a novel spectral-based
estimation procedure and show that we can recover in settings (i.e.,
classes of link function ) where previous algorithms fail. In general, our
algorithm requires only very mild restrictions on the (unknown) functional
relationship between and . We also
consider the high dimensional setting where is sparse ,and introduce
a two-stage nonconvex framework that addresses estimation challenges in high
dimensional regimes where . For a broad class of link functions
between and , we establish minimax
lower bounds that demonstrate the optimality of our estimators in both the
classical and high dimensional regimes.Comment: 25 pages, 3 figure
Model Assisted Variable Clustering: Minimax-optimal Recovery and Algorithms
Model-based clustering defines population level clusters relative to a model
that embeds notions of similarity. Algorithms tailored to such models yield
estimated clusters with a clear statistical interpretation. We take this view
here and introduce the class of G-block covariance models as a background model
for variable clustering. In such models, two variables in a cluster are deemed
similar if they have similar associations will all other variables. This can
arise, for instance, when groups of variables are noise corrupted versions of
the same latent factor. We quantify the difficulty of clustering data generated
from a G-block covariance model in terms of cluster proximity, measured with
respect to two related, but different, cluster separation metrics. We derive
minimax cluster separation thresholds, which are the metric values below which
no algorithm can recover the model-defined clusters exactly, and show that they
are different for the two metrics. We therefore develop two algorithms, COD and
PECOK, tailored to G-block covariance models, and study their
minimax-optimality with respect to each metric. Of independent interest is the
fact that the analysis of the PECOK algorithm, which is based on a corrected
convex relaxation of the popular K-means algorithm, provides the first
statistical analysis of such algorithms for variable clustering. Additionally,
we contrast our methods with another popular clustering method, spectral
clustering, specialized to variable clustering, and show that ensuring exact
cluster recovery via this method requires clusters to have a higher separation,
relative to the minimax threshold. Extensive simulation studies, as well as our
data analyses, confirm the applicability of our approach.Comment: Maintext: 38 pages; supplementary information: 37 page
- …