32,031 research outputs found
On Degrees of Freedom of Projection Estimators with Applications to Multivariate Nonparametric Regression
In this paper, we consider the nonparametric regression problem with
multivariate predictors. We provide a characterization of the degrees of
freedom and divergence for estimators of the unknown regression function, which
are obtained as outputs of linearly constrained quadratic optimization
procedures, namely, minimizers of the least squares criterion with linear
constraints and/or quadratic penalties. As special cases of our results, we
derive explicit expressions for the degrees of freedom in many nonparametric
regression problems, e.g., bounded isotonic regression, multivariate
(penalized) convex regression, and additive total variation regularization. Our
theory also yields, as special cases, known results on the degrees of freedom
of many well-studied estimators in the statistics literature, such as ridge
regression, Lasso and generalized Lasso. Our results can be readily used to
choose the tuning parameter(s) involved in the estimation procedure by
minimizing the Stein's unbiased risk estimate. As a by-product of our analysis
we derive an interesting connection between bounded isotonic regression and
isotonic regression on a general partially ordered set, which is of independent
interest.Comment: 72 pages, 7 figures, Journal of the American Statistical Association
(Theory and Methods), 201
Boosting insights in insurance tariff plans with tree-based machine learning methods
Pricing actuaries typically operate within the framework of generalized
linear models (GLMs). With the upswing of data analytics, our study puts focus
on machine learning methods to develop full tariff plans built from both the
frequency and severity of claims. We adapt the loss functions used in the
algorithms such that the specific characteristics of insurance data are
carefully incorporated: highly unbalanced count data with excess zeros and
varying exposure on the frequency side combined with scarce, but potentially
long-tailed data on the severity side. A key requirement is the need for
transparent and interpretable pricing models which are easily explainable to
all stakeholders. We therefore focus on machine learning with decision trees:
starting from simple regression trees, we work towards more advanced ensembles
such as random forests and boosted trees. We show how to choose the optimal
tuning parameters for these models in an elaborate cross-validation scheme, we
present visualization tools to obtain insights from the resulting models and
the economic value of these new modeling approaches is evaluated. Boosted trees
outperform the classical GLMs, allowing the insurer to form profitable
portfolios and to guard against potential adverse risk selection
Statistical properties of the method of regularization with periodic Gaussian reproducing kernel
The method of regularization with the Gaussian reproducing kernel is popular
in the machine learning literature and successful in many practical
applications.
In this paper we consider the periodic version of the Gaussian kernel
regularization.
We show in the white noise model setting, that in function spaces of very
smooth functions, such as the infinite-order Sobolev space and the space of
analytic functions, the method under consideration is asymptotically minimax;
in finite-order Sobolev spaces, the method is rate optimal, and the efficiency
in terms of constant when compared with the minimax estimator is reasonably
high. The smoothing parameters in the periodic Gaussian regularization can be
chosen adaptively without loss of asymptotic efficiency. The results derived in
this paper give a partial explanation of the success of the
Gaussian reproducing kernel in practice. Simulations are carried out to study
the finite sample properties of the periodic Gaussian regularization.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000045
Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator
We point out some pitfalls related to the concept of an oracle property as
used in Fan and Li (2001, 2002, 2004) which are reminiscent of the well-known
pitfalls related to Hodges' estimator. The oracle property is often a
consequence of sparsity of an estimator. We show that any estimator satisfying
a sparsity property has maximal risk that converges to the supremum of the loss
function; in particular, the maximal risk diverges to infinity whenever the
loss function is unbounded. For ease of presentation the result is set in the
framework of a linear regression model, but generalizes far beyond that
setting. In a Monte Carlo study we also assess the extent of the problem in
finite samples for the smoothly clipped absolute deviation (SCAD) estimator
introduced in Fan and Li (2001). We find that this estimator can perform rather
poorly in finite samples and that its worst-case performance relative to
maximum likelihood deteriorates with increasing sample size when the estimator
is tuned to sparsity.Comment: 18 pages, 5 figure
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
Interpolators -- estimators that achieve zero training error -- have
attracted growing attention in machine learning, mainly because state-of-the
art neural networks appear to be models of this type. In this paper, we study
minimum norm (``ridgeless'') interpolation in high-dimensional least
squares regression. We consider two different models for the feature
distribution: a linear model, where the feature vectors
are obtained by applying a linear transform to a vector of i.i.d.\ entries,
(with ); and a nonlinear model,
where the feature vectors are obtained by passing the input through a random
one-layer neural network, (with ,
a matrix of i.i.d.\ entries, and an
activation function acting componentwise on ). We recover -- in a
precise quantitative way -- several phenomena that have been observed in
large-scale neural networks and kernel machines, including the "double descent"
behavior of the prediction risk, and the potential benefits of
overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version
of earlier results, and results for general coefficient
- …