352 research outputs found
Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices
This paper is concerned with the interplay between statistical asymmetry and
spectral methods. Suppose we are interested in estimating a rank-1 and
symmetric matrix , yet only a
randomly perturbed version is observed. The noise matrix
is composed of zero-mean independent (but not
necessarily homoscedastic) entries and is, therefore, not symmetric in general.
This might arise, for example, when we have two independent samples for each
entry of and arrange them into an {\em asymmetric} data
matrix . The aim is to estimate the leading eigenvalue and
eigenvector of . We demonstrate that the leading eigenvalue
of the data matrix can be times more accurate --- up
to some log factor --- than its (unadjusted) leading singular value in
eigenvalue estimation. Further, the perturbation of any linear form of the
leading eigenvector of --- say, entrywise eigenvector perturbation
--- is provably well-controlled. This eigen-decomposition approach is fully
adaptive to heteroscedasticity of noise without the need of careful bias
correction or any prior knowledge about the noise variance. We also provide
partial theory for the more general rank- case. The takeaway message is
this: arranging the data samples in an asymmetric manner and performing
eigen-decomposition could sometimes be beneficial.Comment: accepted to Annals of Statistics, 2020. 37 page
Spectral Method and Regularized MLE Are Both Optimal for Top- Ranking
This paper is concerned with the problem of top- ranking from pairwise
comparisons. Given a collection of items and a few pairwise comparisons
across them, one wishes to identify the set of items that receive the
highest ranks. To tackle this problem, we adopt the logistic parametric model
--- the Bradley-Terry-Luce model, where each item is assigned a latent
preference score, and where the outcome of each pairwise comparison depends
solely on the relative scores of the two items involved. Recent works have made
significant progress towards characterizing the performance (e.g. the mean
square error for estimating the scores) of several classical methods, including
the spectral method and the maximum likelihood estimator (MLE). However, where
they stand regarding top- ranking remains unsettled.
We demonstrate that under a natural random sampling model, the spectral
method alone, or the regularized MLE alone, is minimax optimal in terms of the
sample complexity --- the number of paired comparisons needed to ensure exact
top- identification, for the fixed dynamic range regime. This is
accomplished via optimal control of the entrywise error of the score estimates.
We complement our theoretical studies by numerical experiments, confirming that
both methods yield low entrywise errors for estimating the underlying scores.
Our theory is established via a novel leave-one-out trick, which proves
effective for analyzing both iterative and non-iterative procedures. Along the
way, we derive an elementary eigenvector perturbation bound for probability
transition matrices, which parallels the Davis-Kahan theorem for
symmetric matrices. This also allows us to close the gap between the
error upper bound for the spectral method and the minimax lower limit.Comment: Add discussions on the setting of the general condition numbe
Influence of parameters on flame expansion in a high-speed flow : experimental and numerical study
Flameholder-stabilized flames are conventional and also commonly used in propulsion and various power generation fields to maintain combustion process. The characteristics of flame expansion were obtained with various blockage ratios, which were observed to be highly sensitive to inlet conditions such as temperatures and velocities. Experiments and simulations combined methodology was performed; also the approach adopted on image processing was calculated automatically through a program written in MATLAB. It was found that the change of flame expansion angle indicated increasing fuel supply could contribute to the growth of flame expansion angle in lean premixed combustion. Besides, the influence of inlet velocity on flame expansion angle varies with different blockage ratios, i.e. under a small blockage ratio (BR ¼ 0.1), flame expansion angle declined with the increase of velocity; however, under a larger blockage ratio (BR ¼ 0.2 or 0.3), flame expansion angle increased firstly and then decreased with the increasing velocity. Likewise, flame expansion angle increased firstly and then decreased with the increasing temperature under BR ¼ 0.2/0.3. In addition, flame expansion angle was almost the same for BR ¼ 0.2 and BR ¼ 0.3 at a higher temperature (900 K), and both of which were bigger than BR ¼ 0.1. Overall, BR ¼ 0.2 is the best for increasing flame expansion angle and reducing total pressure loss. The influence of velocity and temperature on flame expansion angle found from this research are vital for engineering practice and for developing a further image processing method to measure flame boundary
Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data
This paper delivers improved theoretical guarantees for the convex
programming approach in low-rank matrix estimation, in the presence of (1)
random noise, (2) gross sparse outliers, and (3) missing data. This problem,
often dubbed as robust principal component analysis (robust PCA), finds
applications in various domains. Despite the wide applicability of convex
relaxation, the available statistical support (particularly the stability
analysis vis-a-vis random noise) remains highly suboptimal, which we strengthen
in this paper. When the unknown matrix is well-conditioned, incoherent, and of
constant rank, we demonstrate that a principled convex program achieves
near-optimal statistical accuracy, in terms of both the Euclidean loss and the
loss. All of this happens even when nearly a constant fraction
of observations are corrupted by outliers with arbitrary magnitudes. The key
analysis idea lies in bridging the convex program in use and an auxiliary
nonconvex optimization algorithm, and hence the title of this paper
Inference and Uncertainty Quantification for Noisy Matrix Completion
Noisy matrix completion aims at estimating a low-rank matrix given only
partial and corrupted entries. Despite substantial progress in designing
efficient estimation algorithms, it remains largely unclear how to assess the
uncertainty of the obtained estimates and how to perform statistical inference
on the unknown matrix (e.g.~constructing a valid and short confidence interval
for an unseen entry).
This paper takes a step towards inference and uncertainty quantification for
noisy matrix completion. We develop a simple procedure to compensate for the
bias of the widely used convex and nonconvex estimators. The resulting
de-biased estimators admit nearly precise non-asymptotic distributional
characterizations, which in turn enable optimal construction of confidence
intervals\,/\,regions for, say, the missing entries and the low-rank factors.
Our inferential procedures do not rely on sample splitting, thus avoiding
unnecessary loss of data efficiency. As a byproduct, we obtain a sharp
characterization of the estimation accuracy of our de-biased estimators, which,
to the best of our knowledge, are the first tractable algorithms that provably
achieve full statistical efficiency (including the preconstant). The analysis
herein is built upon the intimate link between convex and nonconvex
optimization --- an appealing feature recently discovered by
\cite{chen2019noisy}.Comment: published at Proceedings of the National Academy of Sciences Nov
2019, 116 (46) 22931-2293
Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval
This paper considers the problem of solving systems of quadratic equations,
namely, recovering an object of interest
from quadratic equations/samples
, . This
problem, also dubbed as phase retrieval, spans multiple domains including
physical sciences and machine learning.
We investigate the efficiency of gradient descent (or Wirtinger flow)
designed for the nonconvex least squares problem. We prove that under Gaussian
designs, gradient descent --- when randomly initialized --- yields an
-accurate solution in iterations
given nearly minimal samples, thus achieving near-optimal computational and
sample complexities at once. This provides the first global convergence
guarantee concerning vanilla gradient descent for phase retrieval, without the
need of (i) carefully-designed initialization, (ii) sample splitting, or (iii)
sophisticated saddle-point escaping schemes. All of these are achieved by
exploiting the statistical models in analyzing optimization algorithms, via a
leave-one-out approach that enables the decoupling of certain statistical
dependency between the gradient descent iterates and the data.Comment: Accepted to Mathematical Programmin
- …