Search CORE

234 research outputs found

On the optimality of misspecified spectral algorithms

Author: Li Yicheng
Lin Qian
Zhang Haobo
Publication venue
Publication date: 27/03/2023
Field of study

In the misspecified spectral algorithms problem, researchers usually assume the underground true function

f_{\rho}^{*} \in [\mathcal{H}]^{s}

, a less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS)

\mathcal{H}

for some

s\in (0,1)

. The existing minimax optimal results require

\|f_{\rho}^{*}\|_{L^{\infty}} \alpha_{0}

where

\alpha_{0}\in (0,1)

is the embedding index, a constant depending on

\mathcal{H}

. Whether the spectral algorithms are optimal for all

s\in (0,1)

is an outstanding problem lasting for years. In this paper, we show that spectral algorithms are minimax optimal for any

\alpha_{0}-\frac{1}{\beta} < s < 1

, where

\beta

is the eigenvalue decay rate of

\mathcal{H}

. We also give several classes of RKHSs whose embedding index satisfies

\alpha_0 = \frac{1}{\beta}

. Thus, the spectral algorithms are minimax optimal for all

s\in (0,1)

on these RKHSs.Comment: 48 pages, 2 figure

arXiv.org e-Print Archive

Kernel interpolation generalizes poorly

Author: Li Yicheng
Lin Qian
Zhang Haobo
Publication venue
Publication date: 28/03/2023
Field of study

One of the most interesting problems in the recent renaissance of the studies in kernel regression might be whether the kernel interpolation can generalize well, since it may help us understand the `benign overfitting henomenon' reported in the literature on deep networks. In this paper, under mild conditions, we show that for any

\varepsilon>0

, the generalization error of kernel interpolation is lower bounded by

\Omega(n^{-\varepsilon})

. In other words, the kernel interpolation generalizes poorly for a large class of kernels. As a direct corollary, we can show that overfitted wide neural networks defined on sphere generalize poorly

arXiv.org e-Print Archive

On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay

Author: Li Yicheng
Lin Qian
Zhang Haobo
Publication venue
Publication date: 23/09/2023
Field of study

The widely observed 'benign overfitting phenomenon' in the neural network literature raises the challenge to the 'bias-variance trade-off' doctrine in the statistical learning theory. Since the generalization ability of the 'lazy trained' over-parametrized neural network can be well approximated by that of the neural tangent kernel regression, the curve of the excess risk (namely, the learning curve) of kernel ridge regression attracts increasing attention recently. However, most recent arguments on the learning curve are heuristic and are based on the 'Gaussian design' assumption. In this paper, under mild and more realistic assumptions, we rigorously provide a full characterization of the learning curve: elaborating the effect and the interplay of the choice of the regularization parameter, the source condition and the noise. In particular, our results suggest that the 'benign overfitting phenomenon' exists in very wide neural networks only when the noise level is small

arXiv.org e-Print Archive

Breaking of brightness consistency in optical flow with a lightweight CNN network

Author: Han Bin
Jiang Yunlong
Lin Yicheng
Wang Shuo
Publication venue
Publication date: 24/10/2023
Field of study

Sparse optical flow is widely used in various computer vision tasks, however assuming brightness consistency limits its performance in High Dynamic Range (HDR) environments. In this work, a lightweight network is used to extract illumination robust convolutional features and corners with strong invariance. Modifying the typical brightness consistency of the optical flow method to the convolutional feature consistency yields the light-robust hybrid optical flow method. The proposed network runs at 190 FPS on a commercial CPU because it uses only four convolutional layers to extract feature maps and score maps simultaneously. Since the shallow network is difficult to train directly, a deep network is designed to compute the reliability map that helps it. An end-to-end unsupervised training mode is used for both networks. To validate the proposed method, we compare corner repeatability and matching performance with origin optical flow under dynamic illumination. In addition, a more accurate visual inertial system is constructed by replacing the optical flow method in VINS-Mono. In a public HDR dataset, it reduces translation errors by 93\%. The code is publicly available at https://github.com/linyicheng1/LET-NET.Comment: 7 pages,7 figure

arXiv.org e-Print Archive

Statistical Optimality of Deep Wide Neural Networks

Author: Chen Guhan
Li Yicheng
Lin Qian
Yu Zixiong
Publication venue
Publication date: 27/06/2023
Field of study

In this paper, we consider the generalization ability of deep wide feedforward ReLU neural networks defined on a bounded domain

\mathcal X \subset \mathbb R^{d}

. We first demonstrate that the generalization ability of the neural network can be fully characterized by that of the corresponding deep neural tangent kernel (NTK) regression. We then investigate on the spectral properties of the deep NTK and show that the deep NTK is positive definite on

\mathcal{X}

and its eigenvalue decay rate is

(d+1)/d

. Thanks to the well established theories in kernel regression, we then conclude that multilayer wide neural networks trained by gradient descent with proper early stopping achieve the minimax rate, provided that the regression function lies in the reproducing kernel Hilbert space (RKHS) associated with the corresponding NTK. Finally, we illustrate that the overfitted multilayer wide neural networks can not generalize well on

\mathbb S^{d}

. We believe our technical contributions in determining the eigenvalue decay rate of NTK on

\mathbb R^{d}

might be of independent interests

arXiv.org e-Print Archive

Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation

Author: Cai Jianfei
Lin Guosheng
Qian Chen
Wu Yicheng
Wu Zhonghua
Publication venue
Publication date: 19/07/2022
Field of study

Weakly supervised point cloud segmentation, i.e. semantically segmenting a point cloud with only a few labeled points in the whole 3D scene, is highly desirable due to the heavy burden of collecting abundant dense annotations for the model training. However, existing methods remain challenging to accurately segment 3D point clouds since limited annotated data may lead to insufficient guidance for label propagation to unlabeled data. Considering the smoothness-based methods have achieved promising progress, in this paper, we advocate applying the consistency constraint under various perturbations to effectively regularize unlabeled 3D points. Specifically, we propose a novel DAT (\textbf{D}ual \textbf{A}daptive \textbf{T}ransformations) model for weakly supervised point cloud segmentation, where the dual adaptive transformations are performed via an adversarial strategy at both point-level and region-level, aiming at enforcing the local and structural smoothness constraints on 3D point clouds. We evaluate our proposed DAT model with two popular backbones on the large-scale S3DIS and ScanNet-V2 datasets. Extensive experiments demonstrate that our model can effectively leverage the unlabeled 3D points and achieve significant performance gains on both datasets, setting new state-of-the-art performance for weakly supervised point cloud segmentation.Comment: ECCV 202

arXiv.org e-Print Archive

Optimal Rate of Kernel Regression in Large Dimensions

Author: Li Yicheng
Lin Qian
Lu Weihao
Xu Manyun
Zhang Haobo
Publication venue
Publication date: 08/09/2023
Field of study

We perform a study on kernel regression for large-dimensional data (where the sample size

n

is polynomially depending on the dimension

d

of the samples, i.e.,

n\asymp d^{\gamma}

for some

\gamma >0

). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity

\varepsilon_{n}^{2}

and the metric entropy

\bar{\varepsilon}_{n}^{2}

respectively. When the target function falls into the RKHS associated with a (general) inner product model defined on

\mathbb{S}^{d}

, we utilize the new tool to show that the minimax rate of the excess risk of kernel regression is

n^{-1/2}

when

n\asymp d^{\gamma}

for

\gamma =2, 4, 6, 8, \cdots

. We then further determine the optimal rate of the excess risk of kernel regression for all the

\gamma>0

and find that the curve of optimal rate varying along

\gamma

exhibits several new phenomena including the {\it multiple descent behavior} and the {\it periodic plateau behavior}. As an application, For the neural tangent kernel (NTK), we also provide a similar explicit description of the curve of optimal rate. As a direct corollary, we know these claims hold for wide neural networks as well

arXiv.org e-Print Archive

Stable dual-wavelength oscillation of an erbium-doped fiber ring laser at room temperature

Author: Bennion Ian
Lai Yicheng
Shu Xuewen
Zhang Lin
Zhang Wei
Zhao Donghui
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 06/09/2002
Field of study

We propose a simple Er-doped fiber laser configuration for achieving stable dual-wavelength oscillation at room temperature, in which a high birefringence fiber Bragg grating was used as the wavelength-selective component. Stable dual-wavelength oscillation at room temperature with a wavelength spacing of 0.23nm and mutually orthogonal polarisation states was achieved by utilising the polarisation hole burning effect. An amplitude variation of less than 0.7dB over 80s period was obtained for both wavelengths

Crossref

Aston Publications Explorer