Search CORE

51 research outputs found

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Author: Chen Yuxin
Chi Yuejie
Ma Cong
Wang Kaizheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/07/2019
Field of study

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical models. In fact, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on three fundamental statistical estimation problems, i.e. phase retrieval, low-rank matrix completion, and blind deconvolution, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. In particular, by marrying statistical modeling with generic optimization theory, we develop a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal error control --- measured entrywise and by the spectral norm --- which might be of independent interest.Comment: accepted to Foundations of Computational Mathematics (FOCM

arXiv.org e-Print Archive

Princeton University Open Access Repository

Spectral Method and Regularized MLE Are Both Optimal for Top- $K$ Ranking

Author: Chen Yuxin
Fan Jianqing
Ma Cong
Wang Kaizheng
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 04/07/2018
Field of study

This paper is concerned with the problem of top-

K

ranking from pairwise comparisons. Given a collection of

n

items and a few pairwise comparisons across them, one wishes to identify the set of

K

items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model --- the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress towards characterizing the performance (e.g. the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-

K

ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity --- the number of paired comparisons needed to ensure exact top-

K

identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and non-iterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis-Kahan

\sin\Theta

theorem for symmetric matrices. This also allows us to close the gap between the

\ell_2

error upper bound for the spectral method and the minimax lower limit.Comment: Add discussions on the setting of the general condition numbe

arXiv.org e-Print Archive

Princeton University Open Access Repository

A Stability Principle for Learning under Non-Stationarity

Author: Huang Chengpiao
Wang Kaizheng
Publication venue
Publication date: 27/10/2023
Field of study

We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while keeping the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptability of this approach to unknown non-stationarity. The regret bound is minimax optimal up to logarithmic factors when the population losses are strongly convex, or Lipschitz only. At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces.Comment: 47 pages, 1 figur

arXiv.org e-Print Archive

Adaptive and Robust Multi-task Learning

Author: Duan Yaqi
Wang Kaizheng
Publication venue
Publication date: 01/04/2022
Field of study

We study the multi-task learning problem that aims to simultaneously analyze multiple datasets collected from different sources and learn one model for each of them. We propose a family of adaptive methods that automatically utilize possible similarities among those tasks while carefully handling their differences. We derive sharp statistical guarantees for the methods and prove their robustness against outlier tasks. Numerical experiments on synthetic and real datasets demonstrate the efficacy of our new methods.Comment: 69 pages, 2 figure

arXiv.org e-Print Archive

Cascaded information enhancement and cross-modal attention feature fusion for multispectral pedestrian detection

Author: Kaixiong Xu
Kaizheng Wang
Yang Yang
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2023
Field of study

Multispectral pedestrian detection is a technology designed to detect and locate pedestrians in Color and Thermal images, which has been widely used in automatic driving, video surveillance, etc. So far most available multispectral pedestrian detection algorithms only achieved limited success in pedestrian detection because of the lacking take into account the confusion of pedestrian information and background noise in Color and Thermal images. Here we propose a multispectral pedestrian detection algorithm, which mainly consists of a cascaded information enhancement module and a cross-modal attention feature fusion module. On the one hand, the cascaded information enhancement module adopts the channel and spatial attention mechanism to perform attention weighting on the features fused by the cascaded feature fusion block. Moreover, it multiplies the single-modal features with the attention weight element by element to enhance the pedestrian features in the single-modal and thus suppress the interference from the background. On the other hand, the cross-modal attention feature fusion module mines the features of both Color and Thermal modalities to complement each other, then the global features are constructed by adding the cross-modal complemented features element by element, which are attentionally weighted to achieve the effective fusion of the two modal features. Finally, the fused features are input into the detection head to detect and locate pedestrians. Extensive experiments have been performed on two improved versions of annotations (sanitized annotations and paired annotations) of the public dataset KAIST. The experimental results show that our method demonstrates a lower pedestrian miss rate and more accurate pedestrian detection boxes compared to the comparison method. Additionally, the ablation experiment also proved the effectiveness of each module designed in this paper

Directory of Open Access Journals