234 research outputs found

    On the optimality of misspecified spectral algorithms

    Full text link
    In the misspecified spectral algorithms problem, researchers usually assume the underground true function fρ[H]sf_{\rho}^{*} \in [\mathcal{H}]^{s}, a less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS) H\mathcal{H} for some s(0,1)s\in (0,1). The existing minimax optimal results require fρLα0\|f_{\rho}^{*}\|_{L^{\infty}} \alpha_{0} where α0(0,1)\alpha_{0}\in (0,1) is the embedding index, a constant depending on H\mathcal{H}. Whether the spectral algorithms are optimal for all s(0,1)s\in (0,1) is an outstanding problem lasting for years. In this paper, we show that spectral algorithms are minimax optimal for any α01β<s<1\alpha_{0}-\frac{1}{\beta} < s < 1, where β\beta is the eigenvalue decay rate of H\mathcal{H}. We also give several classes of RKHSs whose embedding index satisfies α0=1β \alpha_0 = \frac{1}{\beta} . Thus, the spectral algorithms are minimax optimal for all s(0,1)s\in (0,1) on these RKHSs.Comment: 48 pages, 2 figure

    Kernel interpolation generalizes poorly

    Full text link
    One of the most interesting problems in the recent renaissance of the studies in kernel regression might be whether the kernel interpolation can generalize well, since it may help us understand the `benign overfitting henomenon' reported in the literature on deep networks. In this paper, under mild conditions, we show that for any ε>0\varepsilon>0, the generalization error of kernel interpolation is lower bounded by Ω(nε)\Omega(n^{-\varepsilon}). In other words, the kernel interpolation generalizes poorly for a large class of kernels. As a direct corollary, we can show that overfitted wide neural networks defined on sphere generalize poorly

    On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay

    Full text link
    The widely observed 'benign overfitting phenomenon' in the neural network literature raises the challenge to the 'bias-variance trade-off' doctrine in the statistical learning theory. Since the generalization ability of the 'lazy trained' over-parametrized neural network can be well approximated by that of the neural tangent kernel regression, the curve of the excess risk (namely, the learning curve) of kernel ridge regression attracts increasing attention recently. However, most recent arguments on the learning curve are heuristic and are based on the 'Gaussian design' assumption. In this paper, under mild and more realistic assumptions, we rigorously provide a full characterization of the learning curve: elaborating the effect and the interplay of the choice of the regularization parameter, the source condition and the noise. In particular, our results suggest that the 'benign overfitting phenomenon' exists in very wide neural networks only when the noise level is small

    Breaking of brightness consistency in optical flow with a lightweight CNN network

    Full text link
    Sparse optical flow is widely used in various computer vision tasks, however assuming brightness consistency limits its performance in High Dynamic Range (HDR) environments. In this work, a lightweight network is used to extract illumination robust convolutional features and corners with strong invariance. Modifying the typical brightness consistency of the optical flow method to the convolutional feature consistency yields the light-robust hybrid optical flow method. The proposed network runs at 190 FPS on a commercial CPU because it uses only four convolutional layers to extract feature maps and score maps simultaneously. Since the shallow network is difficult to train directly, a deep network is designed to compute the reliability map that helps it. An end-to-end unsupervised training mode is used for both networks. To validate the proposed method, we compare corner repeatability and matching performance with origin optical flow under dynamic illumination. In addition, a more accurate visual inertial system is constructed by replacing the optical flow method in VINS-Mono. In a public HDR dataset, it reduces translation errors by 93\%. The code is publicly available at https://github.com/linyicheng1/LET-NET.Comment: 7 pages,7 figure

    Statistical Optimality of Deep Wide Neural Networks

    Full text link
    In this paper, we consider the generalization ability of deep wide feedforward ReLU neural networks defined on a bounded domain XRd\mathcal X \subset \mathbb R^{d}. We first demonstrate that the generalization ability of the neural network can be fully characterized by that of the corresponding deep neural tangent kernel (NTK) regression. We then investigate on the spectral properties of the deep NTK and show that the deep NTK is positive definite on X\mathcal{X} and its eigenvalue decay rate is (d+1)/d(d+1)/d. Thanks to the well established theories in kernel regression, we then conclude that multilayer wide neural networks trained by gradient descent with proper early stopping achieve the minimax rate, provided that the regression function lies in the reproducing kernel Hilbert space (RKHS) associated with the corresponding NTK. Finally, we illustrate that the overfitted multilayer wide neural networks can not generalize well on Sd\mathbb S^{d}. We believe our technical contributions in determining the eigenvalue decay rate of NTK on Rd\mathbb R^{d} might be of independent interests

    Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation

    Full text link
    Weakly supervised point cloud segmentation, i.e. semantically segmenting a point cloud with only a few labeled points in the whole 3D scene, is highly desirable due to the heavy burden of collecting abundant dense annotations for the model training. However, existing methods remain challenging to accurately segment 3D point clouds since limited annotated data may lead to insufficient guidance for label propagation to unlabeled data. Considering the smoothness-based methods have achieved promising progress, in this paper, we advocate applying the consistency constraint under various perturbations to effectively regularize unlabeled 3D points. Specifically, we propose a novel DAT (\textbf{D}ual \textbf{A}daptive \textbf{T}ransformations) model for weakly supervised point cloud segmentation, where the dual adaptive transformations are performed via an adversarial strategy at both point-level and region-level, aiming at enforcing the local and structural smoothness constraints on 3D point clouds. We evaluate our proposed DAT model with two popular backbones on the large-scale S3DIS and ScanNet-V2 datasets. Extensive experiments demonstrate that our model can effectively leverage the unlabeled 3D points and achieve significant performance gains on both datasets, setting new state-of-the-art performance for weakly supervised point cloud segmentation.Comment: ECCV 202

    Optimal Rate of Kernel Regression in Large Dimensions

    Full text link
    We perform a study on kernel regression for large-dimensional data (where the sample size nn is polynomially depending on the dimension dd of the samples, i.e., ndγn\asymp d^{\gamma} for some γ>0\gamma >0 ). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity εn2\varepsilon_{n}^{2} and the metric entropy εˉn2\bar{\varepsilon}_{n}^{2} respectively. When the target function falls into the RKHS associated with a (general) inner product model defined on Sd\mathbb{S}^{d}, we utilize the new tool to show that the minimax rate of the excess risk of kernel regression is n1/2n^{-1/2} when ndγn\asymp d^{\gamma} for γ=2,4,6,8,\gamma =2, 4, 6, 8, \cdots. We then further determine the optimal rate of the excess risk of kernel regression for all the γ>0\gamma>0 and find that the curve of optimal rate varying along γ\gamma exhibits several new phenomena including the {\it multiple descent behavior} and the {\it periodic plateau behavior}. As an application, For the neural tangent kernel (NTK), we also provide a similar explicit description of the curve of optimal rate. As a direct corollary, we know these claims hold for wide neural networks as well

    Stable dual-wavelength oscillation of an erbium-doped fiber ring laser at room temperature

    Get PDF
    We propose a simple Er-doped fiber laser configuration for achieving stable dual-wavelength oscillation at room temperature, in which a high birefringence fiber Bragg grating was used as the wavelength-selective component. Stable dual-wavelength oscillation at room temperature with a wavelength spacing of 0.23nm and mutually orthogonal polarisation states was achieved by utilising the polarisation hole burning effect. An amplitude variation of less than 0.7dB over 80s period was obtained for both wavelengths
    corecore