5,710 research outputs found
Penalized Likelihood and Bayesian Function Selection in Regression Models
Challenging research in various fields has driven a wide range of
methodological advances in variable selection for regression models with
high-dimensional predictors. In comparison, selection of nonlinear functions in
models with additive predictors has been considered only more recently. Several
competing suggestions have been developed at about the same time and often do
not refer to each other. This article provides a state-of-the-art review on
function selection, focusing on penalized likelihood and Bayesian concepts,
relating various approaches to each other in a unified framework. In an
empirical comparison, also including boosting, we evaluate several methods
through applications to simulated and real data, thereby providing some
guidance on their performance in practice
Empirical stationary correlations for semi-supervised learning on graphs
In semi-supervised learning on graphs, response variables observed at one
node are used to estimate missing values at other nodes. The methods exploit
correlations between nearby nodes in the graph. In this paper we prove that
many such proposals are equivalent to kriging predictors based on a fixed
covariance matrix driven by the link structure of the graph. We then propose a
data-driven estimator of the correlation structure that exploits patterns among
the observed response values. By incorporating even a small fraction of
observed covariation into the predictions, we are able to obtain much improved
prediction on two graph data sets.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS293 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Fast calibrated additive quantile regression
We propose a novel framework for fitting additive quantile regression models,
which provides well calibrated inference about the conditional quantiles and
fast automatic estimation of the smoothing parameters, for model structures as
diverse as those usable with distributional GAMs, while maintaining equivalent
numerical efficiency and stability. The proposed methods are at once
statistically rigorous and computationally efficient, because they are based on
the general belief updating framework of Bissiri et al. (2016) to loss based
inference, but compute by adapting the stable fitting methods of Wood et al.
(2016). We show how the pinball loss is statistically suboptimal relative to a
novel smooth generalisation, which also gives access to fast estimation
methods. Further, we provide a novel calibration method for efficiently
selecting the 'learning rate' balancing the loss with the smoothing priors
during inference, thereby obtaining reliable quantile uncertainty estimates.
Our work was motivated by a probabilistic electricity load forecasting
application, used here to demonstrate the proposed approach. The methods
described here are implemented by the qgam R package, available on the
Comprehensive R Archive Network (CRAN)
Bankruptcy Prediction of Small and Medium Enterprises Using a Flexible Binary Generalized Extreme Value Model
We introduce a binary regression accounting-based model for bankruptcy
prediction of small and medium enterprises (SMEs). The main advantage of the
model lies in its predictive performance in identifying defaulted SMEs. Another
advantage, which is especially relevant for banks, is that the relationship
between the accounting characteristics of SMEs and response is not assumed a
priori (e.g., linear, quadratic or cubic) and can be determined from the data.
The proposed approach uses the quantile function of the generalized extreme
value distribution as link function as well as smooth functions of accounting
characteristics to flexibly model covariate effects. Therefore, the usual
assumptions in scoring models of symmetric link function and linear or
pre-specied covariate-response relationships are relaxed. Out-of-sample and
out-of-time validation on Italian data shows that our proposal outperforms the
commonly used (logistic) scoring model for different default horizons
Nonparametric Transient Classification using Adaptive Wavelets
Classifying transients based on multi band light curves is a challenging but
crucial problem in the era of GAIA and LSST since the sheer volume of
transients will make spectroscopic classification unfeasible. Here we present a
nonparametric classifier that uses the transient's light curve measurements to
predict its class given training data. It implements two novel components: the
first is the use of the BAGIDIS wavelet methodology - a characterization of
functional data using hierarchical wavelet coefficients. The second novelty is
the introduction of a ranked probability classifier on the wavelet coefficients
that handles both the heteroscedasticity of the data in addition to the
potential non-representativity of the training set. The ranked classifier is
simple and quick to implement while a major advantage of the BAGIDIS wavelets
is that they are translation invariant, hence they do not need the light curves
to be aligned to extract features. Further, BAGIDIS is nonparametric so it can
be used for blind searches for new objects. We demonstrate the effectiveness of
our ranked wavelet classifier against the well-tested Supernova Photometric
Classification Challenge dataset in which the challenge is to correctly
classify light curves as Type Ia or non-Ia supernovae. We train our ranked
probability classifier on the spectroscopically-confirmed subsample (which is
not representative) and show that it gives good results for all supernova with
observed light curve timespans greater than 100 days (roughly 55% of the
dataset). For such data, we obtain a Ia efficiency of 80.5% and a purity of
82.4% yielding a highly competitive score of 0.49 whilst implementing a truly
"model-blind" approach to supernova classification. Consequently this approach
may be particularly suitable for the classification of astronomical transients
in the era of large synoptic sky surveys.Comment: 14 pages, 8 figures. Published in MNRA
Regularization for Generalized Additive Mixed Models by Likelihood-Based Boosting
With the emergence of semi- and nonparametric regression the
generalized linear mixed model has been expanded to account for additive predictors. In the present paper an approach to variable selection is proposed that works for generalized additive mixed models. In contrast to common procedures it can be used in high-dimensional settings where many covariates are available and the form of the influence is unknown. It is constructed as a componentwise boosting method and hence is able to perform variable selection. The complexity of the resulting estimator is determined by information criteria. The method is nvestigated in simulation studies for binary and Poisson responses and is illustrated by using real data sets
Regularization for Generalized Additive Mixed Models by Likelihood-Based Boosting
With the emergence of semi- and nonparametric regression the
generalized linear mixed model has been expanded to account for additive predictors. In the present paper an approach to variable selection is proposed that works for generalized additive mixed models. In contrast to common procedures it can be used in high-dimensional settings where many covariates are available and the form of the influence is unknown. It is constructed as a componentwise boosting method and hence is able to perform variable selection. The complexity of the resulting estimator is determined by information criteria. The method is nvestigated in simulation studies for binary and Poisson responses and is illustrated by using real data sets
- …