30,695 research outputs found
Exponential Family Matrix Completion under Structural Constraints
We consider the matrix completion problem of recovering a structured matrix
from noisy and partial measurements. Recent works have proposed tractable
estimators with strong statistical guarantees for the case where the underlying
matrix is low--rank, and the measurements consist of a subset, either of the
exact individual entries, or of the entries perturbed by additive Gaussian
noise, which is thus implicitly suited for thin--tailed continuous data.
Arguably, common applications of matrix completion require estimators for (a)
heterogeneous data--types, such as skewed--continuous, count, binary, etc., (b)
for heterogeneous noise models (beyond Gaussian), which capture varied
uncertainty in the measurements, and (c) heterogeneous structural constraints
beyond low--rank, such as block--sparsity, or a superposition structure of
low--rank plus elementwise sparseness, among others. In this paper, we provide
a vastly unified framework for generalized matrix completion by considering a
matrix completion setting wherein the matrix entries are sampled from any
member of the rich family of exponential family distributions; and impose
general structural constraints on the underlying matrix, as captured by a
general regularizer . We propose a simple convex regularized
--estimator for the generalized framework, and provide a unified and novel
statistical analysis for this general class of estimators. We finally
corroborate our theoretical results on simulated datasets.Comment: 20 pages, 9 figure
Low Rank Matrix Completion with Exponential Family Noise
The matrix completion problem consists in reconstructing a matrix from a
sample of entries, possibly observed with noise. A popular class of estimator,
known as nuclear norm penalized estimators, are based on minimizing the sum of
a data fitting term and a nuclear norm penalization. Here, we investigate the
case where the noise distribution belongs to the exponential family and is
sub-exponential. Our framework alllows for a general sampling scheme. We first
consider an estimator defined as the minimizer of the sum of a log-likelihood
term and a nuclear norm penalization and prove an upper bound on the Frobenius
prediction risk. The rate obtained improves on previous works on matrix
completion for exponential family. When the sampling distribution is known, we
propose another estimator and prove an oracle inequality w.r.t. the
Kullback-Leibler prediction risk, which translates immediatly into an upper
bound on the Frobenius prediction risk. Finally, we show that all the rates
obtained are minimax optimal up to a logarithmic factor
High-dimensional estimation with geometric constraints
Consider measuring an n-dimensional vector x through the inner product with
several measurement vectors, a_1, a_2, ..., a_m. It is common in both signal
processing and statistics to assume the linear response model y_i = +
e_i, where e_i is a noise term. However, in practice the precise relationship
between the signal x and the observations y_i may not follow the linear model,
and in some cases it may not even be known. To address this challenge, in this
paper we propose a general model where it is only assumed that each observation
y_i may depend on a_i only through . We do not assume that the
dependence is known. This is a form of the semiparametric single index model,
and it includes the linear model as well as many forms of the generalized
linear model as special cases. We further assume that the signal x has some
structure, and we formulate this as a general assumption that x belongs to some
known (but arbitrary) feasible set K. We carefully detail the benefit of using
the signal structure to improve estimation. The theory is based on the mean
width of K, a geometric parameter which can be used to understand its effective
dimension in estimation problems. We determine a simple, efficient two-step
procedure for estimating the signal based on this model -- a linear estimation
followed by metric projection onto K. We give general conditions under which
the estimator is minimax optimal up to a constant. This leads to the intriguing
conclusion that in the high noise regime, an unknown non-linearity in the
observations does not significantly reduce one's ability to determine the
signal, even when the non-linearity may be non-invertible. Our results may be
specialized to understand the effect of non-linearities in compressed sensing.Comment: This version incorporates minor revisions suggested by referee
Total positivity in exponential families with application to binary variables
We study exponential families of distributions that are multivariate totally
positive of order 2 (MTP2), show that these are convex exponential families,
and derive conditions for existence of the MLE. Quadratic exponential familes
of MTP2 distributions contain attractive Gaussian graphical models and
ferromagnetic Ising models as special examples. We show that these are defined
by intersecting the space of canonical parameters with a polyhedral cone whose
faces correspond to conditional independence relations. Hence MTP2 serves as an
implicit regularizer for quadratic exponential families and leads to sparsity
in the estimated graphical model. We prove that the maximum likelihood
estimator (MLE) in an MTP2 binary exponential family exists if and only if both
of the sign patterns and are represented in the sample for
every pair of variables; in particular, this implies that the MLE may exist
with observations, in stark contrast to unrestricted binary exponential
families where observations are required. Finally, we provide a novel and
globally convergent algorithm for computing the MLE for MTP2 Ising models
similar to iterative proportional scaling and apply it to the analysis of data
from two psychological disorders
Restricted strong convexity and weighted matrix completion: Optimal bounds with noise
We consider the matrix completion problem under a form of row/column weighted
entrywise sampling, including the case of uniform entrywise sampling as a
special case. We analyze the associated random observation operator, and prove
that with high probability, it satisfies a form of restricted strong convexity
with respect to weighted Frobenius norm. Using this property, we obtain as
corollaries a number of error bounds on matrix completion in the weighted
Frobenius norm under noisy sampling and for both exact and near low-rank
matrices. Our results are based on measures of the "spikiness" and
"low-rankness" of matrices that are less restrictive than the incoherence
conditions imposed in previous work. Our technique involves an -estimator
that includes controls on both the rank and spikiness of the solution, and we
establish non-asymptotic error bounds in weighted Frobenius norm for recovering
matrices lying with -"balls" of bounded spikiness. Using
information-theoretic methods, we show that no algorithm can achieve better
estimates (up to a logarithmic factor) over these same sets, showing that our
conditions on matrices and associated rates are essentially optimal
- …