32,031 research outputs found

    On Degrees of Freedom of Projection Estimators with Applications to Multivariate Nonparametric Regression

    Full text link
    In this paper, we consider the nonparametric regression problem with multivariate predictors. We provide a characterization of the degrees of freedom and divergence for estimators of the unknown regression function, which are obtained as outputs of linearly constrained quadratic optimization procedures, namely, minimizers of the least squares criterion with linear constraints and/or quadratic penalties. As special cases of our results, we derive explicit expressions for the degrees of freedom in many nonparametric regression problems, e.g., bounded isotonic regression, multivariate (penalized) convex regression, and additive total variation regularization. Our theory also yields, as special cases, known results on the degrees of freedom of many well-studied estimators in the statistics literature, such as ridge regression, Lasso and generalized Lasso. Our results can be readily used to choose the tuning parameter(s) involved in the estimation procedure by minimizing the Stein's unbiased risk estimate. As a by-product of our analysis we derive an interesting connection between bounded isotonic regression and isotonic regression on a general partially ordered set, which is of independent interest.Comment: 72 pages, 7 figures, Journal of the American Statistical Association (Theory and Methods), 201

    Boosting insights in insurance tariff plans with tree-based machine learning methods

    Full text link
    Pricing actuaries typically operate within the framework of generalized linear models (GLMs). With the upswing of data analytics, our study puts focus on machine learning methods to develop full tariff plans built from both the frequency and severity of claims. We adapt the loss functions used in the algorithms such that the specific characteristics of insurance data are carefully incorporated: highly unbalanced count data with excess zeros and varying exposure on the frequency side combined with scarce, but potentially long-tailed data on the severity side. A key requirement is the need for transparent and interpretable pricing models which are easily explainable to all stakeholders. We therefore focus on machine learning with decision trees: starting from simple regression trees, we work towards more advanced ensembles such as random forests and boosted trees. We show how to choose the optimal tuning parameters for these models in an elaborate cross-validation scheme, we present visualization tools to obtain insights from the resulting models and the economic value of these new modeling approaches is evaluated. Boosted trees outperform the classical GLMs, allowing the insurer to form profitable portfolios and to guard against potential adverse risk selection

    Statistical properties of the method of regularization with periodic Gaussian reproducing kernel

    Get PDF
    The method of regularization with the Gaussian reproducing kernel is popular in the machine learning literature and successful in many practical applications. In this paper we consider the periodic version of the Gaussian kernel regularization. We show in the white noise model setting, that in function spaces of very smooth functions, such as the infinite-order Sobolev space and the space of analytic functions, the method under consideration is asymptotically minimax; in finite-order Sobolev spaces, the method is rate optimal, and the efficiency in terms of constant when compared with the minimax estimator is reasonably high. The smoothing parameters in the periodic Gaussian regularization can be chosen adaptively without loss of asymptotic efficiency. The results derived in this paper give a partial explanation of the success of the Gaussian reproducing kernel in practice. Simulations are carried out to study the finite sample properties of the periodic Gaussian regularization.Comment: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000045

    Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator

    Full text link
    We point out some pitfalls related to the concept of an oracle property as used in Fan and Li (2001, 2002, 2004) which are reminiscent of the well-known pitfalls related to Hodges' estimator. The oracle property is often a consequence of sparsity of an estimator. We show that any estimator satisfying a sparsity property has maximal risk that converges to the supremum of the loss function; in particular, the maximal risk diverges to infinity whenever the loss function is unbounded. For ease of presentation the result is set in the framework of a linear regression model, but generalizes far beyond that setting. In a Monte Carlo study we also assess the extent of the problem in finite samples for the smoothly clipped absolute deviation (SCAD) estimator introduced in Fan and Li (2001). We find that this estimator can perform rather poorly in finite samples and that its worst-case performance relative to maximum likelihood deteriorates with increasing sample size when the estimator is tuned to sparsity.Comment: 18 pages, 5 figure

    Surprises in High-Dimensional Ridgeless Least Squares Interpolation

    Full text link
    Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ℓ2\ell_2 norm (``ridgeless'') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors xi∈Rpx_i \in {\mathbb R}^p are obtained by applying a linear transform to a vector of i.i.d.\ entries, xi=Σ1/2zix_i = \Sigma^{1/2} z_i (with zi∈Rpz_i \in {\mathbb R}^p); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi=φ(Wzi)x_i = \varphi(W z_i) (with zi∈Rdz_i \in {\mathbb R}^d, W∈Rp×dW \in {\mathbb R}^{p \times d} a matrix of i.i.d.\ entries, and φ\varphi an activation function acting componentwise on WziW z_i). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version of earlier results, and results for general coefficient
    • …
    corecore