51 research outputs found
Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution
Recent years have seen a flurry of activities in designing provably efficient
nonconvex procedures for solving statistical estimation problems. Due to the
highly nonconvex nature of the empirical loss, state-of-the-art procedures
often require proper regularization (e.g. trimming, regularized cost,
projection) in order to guarantee fast convergence. For vanilla procedures such
as gradient descent, however, prior theory either recommends highly
conservative learning rates to avoid overshooting, or completely lacks
performance guarantees.
This paper uncovers a striking phenomenon in nonconvex optimization: even in
the absence of explicit regularization, gradient descent enforces proper
regularization implicitly under various statistical models. In fact, gradient
descent follows a trajectory staying within a basin that enjoys nice geometry,
consisting of points incoherent with the sampling mechanism. This "implicit
regularization" feature allows gradient descent to proceed in a far more
aggressive fashion without overshooting, which in turn results in substantial
computational savings. Focusing on three fundamental statistical estimation
problems, i.e. phase retrieval, low-rank matrix completion, and blind
deconvolution, we establish that gradient descent achieves near-optimal
statistical and computational guarantees without explicit regularization. In
particular, by marrying statistical modeling with generic optimization theory,
we develop a general recipe for analyzing the trajectories of iterative
algorithms via a leave-one-out perturbation argument. As a byproduct, for noisy
matrix completion, we demonstrate that gradient descent achieves near-optimal
error control --- measured entrywise and by the spectral norm --- which might
be of independent interest.Comment: accepted to Foundations of Computational Mathematics (FOCM
Spectral Method and Regularized MLE Are Both Optimal for Top- Ranking
This paper is concerned with the problem of top- ranking from pairwise
comparisons. Given a collection of items and a few pairwise comparisons
across them, one wishes to identify the set of items that receive the
highest ranks. To tackle this problem, we adopt the logistic parametric model
--- the Bradley-Terry-Luce model, where each item is assigned a latent
preference score, and where the outcome of each pairwise comparison depends
solely on the relative scores of the two items involved. Recent works have made
significant progress towards characterizing the performance (e.g. the mean
square error for estimating the scores) of several classical methods, including
the spectral method and the maximum likelihood estimator (MLE). However, where
they stand regarding top- ranking remains unsettled.
We demonstrate that under a natural random sampling model, the spectral
method alone, or the regularized MLE alone, is minimax optimal in terms of the
sample complexity --- the number of paired comparisons needed to ensure exact
top- identification, for the fixed dynamic range regime. This is
accomplished via optimal control of the entrywise error of the score estimates.
We complement our theoretical studies by numerical experiments, confirming that
both methods yield low entrywise errors for estimating the underlying scores.
Our theory is established via a novel leave-one-out trick, which proves
effective for analyzing both iterative and non-iterative procedures. Along the
way, we derive an elementary eigenvector perturbation bound for probability
transition matrices, which parallels the Davis-Kahan theorem for
symmetric matrices. This also allows us to close the gap between the
error upper bound for the spectral method and the minimax lower limit.Comment: Add discussions on the setting of the general condition numbe
A Stability Principle for Learning under Non-Stationarity
We develop a versatile framework for statistical learning in non-stationary
environments. In each time period, our approach applies a stability principle
to select a look-back window that maximizes the utilization of historical data
while keeping the cumulative bias within an acceptable range relative to the
stochastic error. Our theory showcases the adaptability of this approach to
unknown non-stationarity. The regret bound is minimax optimal up to logarithmic
factors when the population losses are strongly convex, or Lipschitz only. At
the heart of our analysis lie two novel components: a measure of similarity
between functions and a segmentation technique for dividing the non-stationary
data sequence into quasi-stationary pieces.Comment: 47 pages, 1 figur
Adaptive and Robust Multi-task Learning
We study the multi-task learning problem that aims to simultaneously analyze
multiple datasets collected from different sources and learn one model for each
of them. We propose a family of adaptive methods that automatically utilize
possible similarities among those tasks while carefully handling their
differences. We derive sharp statistical guarantees for the methods and prove
their robustness against outlier tasks. Numerical experiments on synthetic and
real datasets demonstrate the efficacy of our new methods.Comment: 69 pages, 2 figure
Cascaded information enhancement and cross-modal attention feature fusion for multispectral pedestrian detection
Multispectral pedestrian detection is a technology designed to detect and locate pedestrians in Color and Thermal images, which has been widely used in automatic driving, video surveillance, etc. So far most available multispectral pedestrian detection algorithms only achieved limited success in pedestrian detection because of the lacking take into account the confusion of pedestrian information and background noise in Color and Thermal images. Here we propose a multispectral pedestrian detection algorithm, which mainly consists of a cascaded information enhancement module and a cross-modal attention feature fusion module. On the one hand, the cascaded information enhancement module adopts the channel and spatial attention mechanism to perform attention weighting on the features fused by the cascaded feature fusion block. Moreover, it multiplies the single-modal features with the attention weight element by element to enhance the pedestrian features in the single-modal and thus suppress the interference from the background. On the other hand, the cross-modal attention feature fusion module mines the features of both Color and Thermal modalities to complement each other, then the global features are constructed by adding the cross-modal complemented features element by element, which are attentionally weighted to achieve the effective fusion of the two modal features. Finally, the fused features are input into the detection head to detect and locate pedestrians. Extensive experiments have been performed on two improved versions of annotations (sanitized annotations and paired annotations) of the public dataset KAIST. The experimental results show that our method demonstrates a lower pedestrian miss rate and more accurate pedestrian detection boxes compared to the comparison method. Additionally, the ablation experiment also proved the effectiveness of each module designed in this paper
- …