15 research outputs found

    Estimating conditional quantiles with the help of the pinball loss

    Full text link
    The so-called pinball loss for estimating conditional quantiles is a well-known tool in both statistics and machine learning. So far, however, only little work has been done to quantify the efficiency of this tool for nonparametric approaches. We fill this gap by establishing inequalities that describe how close approximate pinball risk minimizers are to the corresponding conditional quantile. These inequalities, which hold under mild assumptions on the data-generating distribution, are then used to establish so-called variance bounds, which recently turned out to play an important role in the statistical analysis of (regularized) empirical risk minimization approaches. Finally, we use both types of inequalities to establish an oracle inequality for support vector machines that use the pinball loss. The resulting learning rates are min--max optimal under some standard regularity assumptions on the conditional quantile.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ267 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Regularized Regression Problem in hyper-RKHS for Learning Kernels

    Full text link
    This paper generalizes the two-stage kernel learning framework, illustrates its utility for kernel learning and out-of-sample extensions, and proves {asymptotic} convergence results for the introduced kernel learning model. Algorithmically, we extend target alignment by hyper-kernels in the two-stage kernel learning framework. The associated kernel learning task is formulated as a regression problem in a hyper-reproducing kernel Hilbert space (hyper-RKHS), i.e., learning on the space of kernels itself. To solve this problem, we present two regression models with bivariate forms in this space, including kernel ridge regression (KRR) and support vector regression (SVR) in the hyper-RKHS. By doing so, it provides significant model flexibility for kernel learning with outstanding performance in real-world applications. Specifically, our kernel learning framework is general, that is, the learned underlying kernel can be positive definite or indefinite, which adapts to various requirements in kernel learning. Theoretically, we study the convergence behavior of these learning algorithms in the hyper-RKHS and derive the learning rates. Different from the traditional approximation analysis in RKHS, our analyses need to consider the non-trivial independence of pairwise samples and the characterisation of hyper-RKHS. To the best of our knowledge, this is the first work in learning theory to study the approximation performance of regularized regression problem in hyper-RKHS.Comment: 25 pages, 3 figure

    Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks

    Full text link
    Uncertainty quantification (UQ) is important for reliability assessment and enhancement of machine learning models. In deep learning, uncertainties arise not only from data, but also from the training procedure that often injects substantial noises and biases. These hinder the attainment of statistical guarantees and, moreover, impose computational challenges on UQ due to the need for repeated network retraining. Building upon the recent neural tangent kernel theory, we create statistically guaranteed schemes to principally \emph{quantify}, and \emph{remove}, the procedural uncertainty of over-parameterized neural networks with very low computation effort. In particular, our approach, based on what we call a procedural-noise-correcting (PNC) predictor, removes the procedural uncertainty by using only \emph{one} auxiliary network that is trained on a suitably labeled data set, instead of many retrained networks employed in deep ensembles. Moreover, by combining our PNC predictor with suitable light-computation resampling methods, we build several approaches to construct asymptotically exact-coverage confidence intervals using as low as four trained networks without additional overheads