234 research outputs found
On the optimality of misspecified spectral algorithms
In the misspecified spectral algorithms problem, researchers usually assume
the underground true function , a
less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS)
for some . The existing minimax optimal results
require where is the embedding index, a constant
depending on . Whether the spectral algorithms are optimal for all
is an outstanding problem lasting for years. In this paper, we
show that spectral algorithms are minimax optimal for any
, where is the eigenvalue decay
rate of . We also give several classes of RKHSs whose embedding
index satisfies . Thus, the spectral algorithms
are minimax optimal for all on these RKHSs.Comment: 48 pages, 2 figure
Kernel interpolation generalizes poorly
One of the most interesting problems in the recent renaissance of the studies
in kernel regression might be whether the kernel interpolation can generalize
well, since it may help us understand the `benign overfitting henomenon'
reported in the literature on deep networks. In this paper, under mild
conditions, we show that for any , the generalization error of
kernel interpolation is lower bounded by . In other
words, the kernel interpolation generalizes poorly for a large class of
kernels. As a direct corollary, we can show that overfitted wide neural
networks defined on sphere generalize poorly
On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay
The widely observed 'benign overfitting phenomenon' in the neural network
literature raises the challenge to the 'bias-variance trade-off' doctrine in
the statistical learning theory. Since the generalization ability of the 'lazy
trained' over-parametrized neural network can be well approximated by that of
the neural tangent kernel regression, the curve of the excess risk (namely, the
learning curve) of kernel ridge regression attracts increasing attention
recently. However, most recent arguments on the learning curve are heuristic
and are based on the 'Gaussian design' assumption. In this paper, under mild
and more realistic assumptions, we rigorously provide a full characterization
of the learning curve: elaborating the effect and the interplay of the choice
of the regularization parameter, the source condition and the noise. In
particular, our results suggest that the 'benign overfitting phenomenon' exists
in very wide neural networks only when the noise level is small
Breaking of brightness consistency in optical flow with a lightweight CNN network
Sparse optical flow is widely used in various computer vision tasks, however
assuming brightness consistency limits its performance in High Dynamic Range
(HDR) environments. In this work, a lightweight network is used to extract
illumination robust convolutional features and corners with strong invariance.
Modifying the typical brightness consistency of the optical flow method to the
convolutional feature consistency yields the light-robust hybrid optical flow
method. The proposed network runs at 190 FPS on a commercial CPU because it
uses only four convolutional layers to extract feature maps and score maps
simultaneously. Since the shallow network is difficult to train directly, a
deep network is designed to compute the reliability map that helps it. An
end-to-end unsupervised training mode is used for both networks. To validate
the proposed method, we compare corner repeatability and matching performance
with origin optical flow under dynamic illumination. In addition, a more
accurate visual inertial system is constructed by replacing the optical flow
method in VINS-Mono. In a public HDR dataset, it reduces translation errors by
93\%. The code is publicly available at https://github.com/linyicheng1/LET-NET.Comment: 7 pages,7 figure
Statistical Optimality of Deep Wide Neural Networks
In this paper, we consider the generalization ability of deep wide
feedforward ReLU neural networks defined on a bounded domain . We first demonstrate that the generalization ability of
the neural network can be fully characterized by that of the corresponding deep
neural tangent kernel (NTK) regression. We then investigate on the spectral
properties of the deep NTK and show that the deep NTK is positive definite on
and its eigenvalue decay rate is . Thanks to the well
established theories in kernel regression, we then conclude that multilayer
wide neural networks trained by gradient descent with proper early stopping
achieve the minimax rate, provided that the regression function lies in the
reproducing kernel Hilbert space (RKHS) associated with the corresponding NTK.
Finally, we illustrate that the overfitted multilayer wide neural networks can
not generalize well on . We believe our technical contributions
in determining the eigenvalue decay rate of NTK on might be of
independent interests
Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation
Weakly supervised point cloud segmentation, i.e. semantically segmenting a
point cloud with only a few labeled points in the whole 3D scene, is highly
desirable due to the heavy burden of collecting abundant dense annotations for
the model training. However, existing methods remain challenging to accurately
segment 3D point clouds since limited annotated data may lead to insufficient
guidance for label propagation to unlabeled data. Considering the
smoothness-based methods have achieved promising progress, in this paper, we
advocate applying the consistency constraint under various perturbations to
effectively regularize unlabeled 3D points. Specifically, we propose a novel
DAT (\textbf{D}ual \textbf{A}daptive \textbf{T}ransformations) model for weakly
supervised point cloud segmentation, where the dual adaptive transformations
are performed via an adversarial strategy at both point-level and region-level,
aiming at enforcing the local and structural smoothness constraints on 3D point
clouds. We evaluate our proposed DAT model with two popular backbones on the
large-scale S3DIS and ScanNet-V2 datasets. Extensive experiments demonstrate
that our model can effectively leverage the unlabeled 3D points and achieve
significant performance gains on both datasets, setting new state-of-the-art
performance for weakly supervised point cloud segmentation.Comment: ECCV 202
Optimal Rate of Kernel Regression in Large Dimensions
We perform a study on kernel regression for large-dimensional data (where the
sample size is polynomially depending on the dimension of the samples,
i.e., for some ). We first build a general
tool to characterize the upper bound and the minimax lower bound of kernel
regression for large dimensional data through the Mendelson complexity
and the metric entropy
respectively. When the target function falls into the RKHS associated with a
(general) inner product model defined on , we utilize the new
tool to show that the minimax rate of the excess risk of kernel regression is
when for . We then
further determine the optimal rate of the excess risk of kernel regression for
all the and find that the curve of optimal rate varying along
exhibits several new phenomena including the {\it multiple descent
behavior} and the {\it periodic plateau behavior}. As an application, For the
neural tangent kernel (NTK), we also provide a similar explicit description of
the curve of optimal rate. As a direct corollary, we know these claims hold for
wide neural networks as well
Stable dual-wavelength oscillation of an erbium-doped fiber ring laser at room temperature
We propose a simple Er-doped fiber laser configuration for achieving stable dual-wavelength oscillation at room temperature, in which a high birefringence fiber Bragg grating was used as the wavelength-selective component. Stable dual-wavelength oscillation at room temperature with a wavelength spacing of 0.23nm and mutually orthogonal polarisation states was achieved by utilising the polarisation hole burning effect. An amplitude variation of less than 0.7dB over 80s period was obtained for both wavelengths
- …