28 research outputs found

    Using The Censored Gamma Distribution for Modeling Fractional Response Variables with an Application to Loss Given Default

    Full text link
    Regression models for limited continuous dependent variables having a non-negligible probability of attaining exactly their limits are presented. The models differ in the number of parameters and in their flexibility. Fractional data being a special case of limited dependent data, the models also apply to variables that are a fraction or a proportion. It is shown how to fit these models and they are applied to a Loss Given Default dataset from insurance to which they provide a good fit

    Gaussian Process Boosting

    Full text link
    We introduce a novel way to combine boosting with Gaussian process and mixed effects models. This allows for relaxing, first, the linearity assumption for the mean function in Gaussian process and grouped random effects models in a flexible non-parametric way and, second, the independence assumption made in most boosting algorithms. The former is advantageous for predictive accuracy and for avoiding model misspecifications. The latter is important for more efficient learning of the mean function and for obtaining probabilistic predictions. In addition, we present an extension that scales to large data using a Vecchia approximation for the Gaussian process model relying on novel results for covariance parameter inference. We obtain increased predictive accuracy compared to existing approaches on several simulated and real-world data sets

    A dynamic nonstationary spatio-temporal model for short term prediction of precipitation

    Full text link
    Precipitation is a complex physical process that varies in space and time. Predictions and interpolations at unobserved times and/or locations help to solve important problems in many areas. In this paper, we present a hierarchical Bayesian model for spatio-temporal data and apply it to obtain short term predictions of rainfall. The model incorporates physical knowledge about the underlying processes that determine rainfall, such as advection, diffusion and convection. It is based on a temporal autoregressive convolution with spatially colored and temporally white innovations. By linking the advection parameter of the convolution kernel to an external wind vector, the model is temporally nonstationary. Further, it allows for nonseparable and anisotropic covariance structures. With the help of the Voronoi tessellation, we construct a natural parametrization, that is, space as well as time resolution consistent, for data lying on irregular grid points. In the application, the statistical model combines forecasts of three other meteorological variables obtained from a numerical weather prediction model with past precipitation observations. The model is then used to predict three-hourly precipitation over 24 hours. It performs better than a separable, stationary and isotropic version, and it performs comparably to a deterministic numerical weather prediction model for precipitation and has the advantage that it quantifies prediction uncertainty.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS564 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models

    Full text link
    Latent Gaussian process (GP) models are flexible probabilistic non-parametric function models. Vecchia approximations are accurate approximations for GPs to overcome computational bottlenecks for large data, and the Laplace approximation is a fast method with asymptotic convergence guarantees to approximate marginal likelihoods and posterior predictive distributions for non-Gaussian likelihoods. Unfortunately, the computational complexity of combined Vecchia-Laplace approximations grows faster than linearly in the sample size when used in combination with direct solver methods such as the Cholesky decomposition. Computations with Vecchia-Laplace approximations thus become prohibitively slow precisely when the approximations are usually the most accurate, i.e., on large data sets. In this article, we present several iterative methods for inference with Vecchia-Laplace approximations which make computations considerably faster compared to Cholesky-based calculations. We analyze our proposed methods theoretically and in experiments with simulated and real-world data. In particular, we obtain a speed-up of an order of magnitude compared to Cholesky-based inference and a threefold increase in prediction accuracy in terms of the continuous ranked probability score compared to a state-of-the-art method on a large satellite data set. All methods are implemented in a free C++ software library with high-level Python and R packages

    Using the Censored Gamma Distribution for Modeling Fractional Response Variables with an Application to Loss Given Default

    Get PDF
    Regression models for limited continuous dependent variables having a non-negligible probability of attaining exactly their limits are presented. The models differ in the number of parameters and in their flexibility. Fractional data being a special case of limited dependent data, the models also apply to variables that are a fraction or a proportion. It is shown how to fit these models and they are applied to a Loss Given Default dataset from insurance to which they provide a good fi

    Joint variable selection of both fixed and random effects for Gaussian process-based spatially varying coefficient models

    Full text link
    Spatially varying coefficient (SVC) models are a type of regression models for spatial data where covariate effects vary over space. If there are several covariates, a natural question is which covariates have a spatially varying effect and which not. We present a new variable selection approach for Gaussian process-based SVC models. It relies on a penalized maximum likelihood estimation and allows joint variable selection both with respect to fixed effects and Gaussian process random effects. We validate our approach in a simulation study as well as a real world data set. In the simulation study, the penalized maximum likelihood estimation correctly identifies zero fixed and random effects, while the penalty-induced bias of non-zero estimates is negligible. In the real data application, our proposed penalized maximum likelihood estimation yields sparser SVC models and achieves a smaller information criterion than classical maximum likelihood estimation. In a cross-validation study applied on the real data, we show that our proposed penalized maximum likelihood estimation consistently yields the sparsest SVC models while achieving similar predictive performance compared to other SVC modeling methodologies
    corecore