22 research outputs found
Sparse Matrix Inversion with Scaled Lasso
We propose a new method of learning a sparse nonnegative-definite target
matrix. Our primary example of the target matrix is the inverse of a population
covariance or correlation matrix. The algorithm first estimates each column of
the target matrix by the scaled Lasso and then adjusts the matrix estimator to
be symmetric. The penalty level of the scaled Lasso for each column is
completely determined by data via convex minimization, without using
cross-validation.
We prove that this scaled Lasso method guarantees the fastest proven rate of
convergence in the spectrum norm under conditions of weaker form than those in
the existing analyses of other regularized algorithms, and has faster
guaranteed rate of convergence when the ratio of the and spectrum
norms of the target inverse matrix diverges to infinity. A simulation study
demonstrates the computational feasibility and superb performance of the
proposed method.
Our analysis also provides new performance bounds for the Lasso and scaled
Lasso to guarantee higher concentration of the error at a smaller threshold
level than previous analyses, and to allow the use of the union bound in
column-by-column applications of the scaled Lasso without an adjustment of the
penalty level. In addition, the least squares estimation after the scaled Lasso
selection is considered and proven to guarantee performance bounds similar to
that of the scaled Lasso
Scaled Sparse Linear Regression
Scaled sparse linear regression jointly estimates the regression coefficients
and noise level in a linear model. It chooses an equilibrium with a sparse
regression method by iteratively estimating the noise level via the mean
residual square and scaling the penalty in proportion to the estimated noise
level. The iterative algorithm costs little beyond the computation of a path or
grid of the sparse regression estimator for penalty levels above a proper
threshold. For the scaled lasso, the algorithm is a gradient descent in a
convex minimization of a penalized joint loss function for the regression
coefficients and noise level. Under mild regularity conditions, we prove that
the scaled lasso simultaneously yields an estimator for the noise level and an
estimated coefficient vector satisfying certain oracle inequalities for
prediction, the estimation of the noise level and the regression coefficients.
These inequalities provide sufficient conditions for the consistency and
asymptotic normality of the noise level estimator, including certain cases
where the number of variables is of greater order than the sample size.
Parallel results are provided for the least squares estimation after model
selection by the scaled lasso. Numerical results demonstrate the superior
performance of the proposed methods over an earlier proposal of joint convex
minimization.Comment: 20 page
Calibrated Elastic Regularization in Matrix Completion
This paper concerns the problem of matrix completion, which is to estimate a
matrix from observations in a small subset of indices. We propose a calibrated
spectrum elastic net method with a sum of the nuclear and Frobenius penalties
and develop an iterative algorithm to solve the convex minimization problem.
The iterative algorithm alternates between imputing the missing entries in the
incomplete matrix by the current guess and estimating the matrix by a scaled
soft-thresholding singular value decomposition of the imputed matrix until the
resulting matrix converges. A calibration step follows to correct the bias
caused by the Frobenius penalty. Under proper coherence conditions and for
suitable penalties levels, we prove that the proposed estimator achieves an
error bound of nearly optimal order and in proportion to the noise level. This
provides a unified analysis of the noisy and noiseless matrix completion
problems. Simulation results are presented to compare our proposal with
previous ones.Comment: 9 pages; Advances in Neural Information Processing Systems, NIPS 201
A statistical analysis of vaccine-adverse event data
Vaccination has been one of the most successful public health interventions to date, and the U.S. FDA/CDC Vaccine Adverse Event Reporting System (VAERS) currently contains more than 500,000 reports for post-vaccination adverse events that occur after the administration of vaccines licensed in the United States. The VAERS dataset is huge, contains very large dimension nominal variables, and is complex due to multiple listing of vaccines and adverse symptoms in a single report. So far there has not been any statistical analysis conducted in attempting to identify the cross-board patterns on how all reported adverse symptoms are related to the vaccines.https://doi.org/10.1186/s12911-019-0818-
Statistical methods for high-dimensional data and continuous glucose monitoring
This thesis contains two parts. The first part concerns three connected problems with high-dimensional data in Chapters 2-4. The second part, Chapter 5, provides dynamic Bayes models to improve the continuous glucose monitoring. In the first part, we propose a unified scale invariant method for the estimation of parameters in linear regression, precision matrix and partial correlation. In Chapter 2, scaled Lasso is introduced to jointly estimate regression coefficients and noise level with a gradient descent algorithm. Under mild regularity conditions, we derive oracle inequalities for the prediction and estimation of the noise level and regression coefficients. These oracle inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise level estimator, including certain cases where the number of variables is of greater order than the sample size. Chapter 3 considers the estimation of precision matrix, which is closely related to linear regression. The proposed estimator is constructed via the scaled Lasso, and guarantees the fastest convergence rate under the spectrum norm. Besides the estimation of high-dimensional objects, the estimation of low-dimensional functionals of high-dimensional objects is also of great interest. A rate minimax estimator of a high-dimensional parameter does not automatically yield rate minimax estimates of its low-dimensional functionals. We consider efficient estimation of partial correlation between individual pairs of variables in Chapter 4. Numerical results demonstrate the superior performance of the proposed methods. In the second part, we develop statistical methods to produce more accurate and precise estimates for continuous glucose monitoring. The continuous glucose monitor measures the glucose level via an electrochemical glucose biosensor, inserted into subcutaneous fat tissue, called interstitial space. We use dynamic Bayes models to incorporate the linear relationship between the blood glucose level and interstitial signal, the time series aspects of the data, and the variability depending on sensor age. The Bayes method has been tested and evaluated with an important large dataset, called ``Star I'', from Medtronic, Inc., composed of continuous monitoring of glucose and other measurements. The results show that the Bayesian blood glucose prediction outperforms the output of the continuous glucose monitor in the STAR 1 trial.Ph. D.Includes bibliographical referencesIncludes vitaby Tingni Su
