802 research outputs found

    Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models

    Full text link
    Structured additive regression provides a general framework for complex Gaussian and non-Gaussian regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further regression terms. The large flexibility of structured additive regression makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor and (3) determining the required interactions. We propose a spike-and-slab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (real) benchmark classification data, we investigate sensitivity to hyperparameter settings and compare performance to competitors. The flexibility and applicability of our approach are demonstrated in an additive piecewise exponential model with time-varying effects for right-censored survival times of intensive care patients with sepsis. Geoadditive and additive mixed logit model applications are discussed in an extensive appendix

    Normal-Mixture-of-Inverse-Gamma Priors for Bayesian Regularization and Model Selection in Structured Additive Regression Models

    Get PDF
    In regression models with many potential predictors, choosing an appropriate subset of covariates and their interactions at the same time as determining whether linear or more flexible functional forms are required is a challenging and important task. We propose a spike-and-slab prior structure in order to include or exclude single coefficients as well as blocks of coefficients associated with factor variables, random effects or basis expansions of smooth functions. Structured additive models with this prior structure are estimated with Markov Chain Monte Carlo using a redundant multiplicative parameter expansion. We discuss shrinkage properties of the novel prior induced by the redundant parameterization, investigate its sensitivity to hyperparameter settings and compare performance of the proposed method in terms of model selection, sparsity recovery, and estimation error for Gaussian, binomial and Poisson responses on real and simulated data sets with that of component-wise boosting and other approaches

    IMPLEMENTING PLS FOR DISTANCE-BASED REGRESSION: COMPUTATIONAL ISSUES

    Get PDF
    Distance-based regression allows for a neat implementation of the Partial Least Squares recurrence. In this paper we address practical issues arising when dealing with moderately large datasets (n ~ 104) such as those typical of automobile insurance premium calculations.

    Influence of spatial interpolation methods for climate variables on the simulation of discharge and nitrate fate with SWAT

    Get PDF
    For ecohydrological modeling climate variables are needed on subbasin basis. Since they usually originate from point measurements spatial interpolation is required during preprocessing. Different interpolation methods yield data of varying quality, which can strongly influence modeling results. Four interpolation methods to be compared were selected: nearest neighbour, inverse distance, ordinary kriging, and kriging with external drift (Goovaerts, 1997). This study presents three strategies to evaluate the influence of the interpolation method on the modeling results of discharge and nitrate load in the river in a mesoscale river catchment (āˆ¼1000 km2) using the Soil and Water Assessment Tool (SWAT, Neitsch et al., 2005) model: I. Automated calibration of the model with a mixed climate data set and consecutive application of the four interpolated data sets. II. Consecutive automated calibration of the model with each of the four climate data sets. III. Random generation of 1000 model parameter sets and consecutive application of the four interpolated climate data sets on each of the 1000 realisations, evaluating the number of realisations above a certain quality criterion threshold. Results show that strategies I and II are not suitable for evaluation of the quality of the interpolated data. Strategy III however proves a significant influence of the interpolation method on nitrate modeling. A rank order from the simplest to the most sophisticated method is visible, with kriging with external drift (KED) outperforming all others. Responsible for this behaviour is the variable temperature, which benefits most from more sophisticated methods and at the same time is the main driving force for the nitrate cycle. The missing influence of the interpolation methods on discharge modeling is explained by a much higher measuring network density for precipitation than for all other climate variables

    Supervised classification and mathematical optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data.Ministerio de Ciencia e InnovaciĆ³nJunta de AndalucĆ­

    Supervised Classification and Mathematical Optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data

    AUTOLOGISTIC MODEL OF SPATIAL PATTERN OF PHYTOPHTHORA EPIDEMIC IN BELL PEPPER: EFFECTS OF SOIL VARIABLES ON DISEASE PRESENCE

    Get PDF
    The pathogen Phytophthora capsici causes lesions on the crown, stem, and leaves of bell pepper, and rapidly causes the plant to die. The spatial patterns of disease in an agricultural field contain information about pathogen dispersal mechanisms and can be useful for developing methods of control of disease. Soil water content, soil pathogen population density, and disease incidence data were collected on a 20 x 20 grid in two naturally infested commercial bell pepper fields. In one field the initial pattern of disease closely matched the soil water content pattern and disease developed in areas where the pathogen population levels were high. In the other field no such correspondence was obvious from maps of disease and soil water content . The auto logistic model is a flexible model for predicting presence or absence of disease based on soil water content and soil pathogen population, while taking spatial correlation into account. In the autologistic model the log odds of disease in a particular quadrat are modeled as a linear combination of disease in neighboring quadrats and the soil variables. Neighboring quadrats can be defined as adjacent quadrats within a row, quadrats in adjacent rows, quadrats two rows away, and so forth. The regression coefficients give estimates of the increase in odds of disease if neighbors within a row or in adjacent rows show disease symptoms; thus we obtain information about the degree of spread in different directions. The coefficients for the soil variables give estimates of the increase in odds of disease as soil water content or pathogen population density increase. In this problem, soil water content is also highly correlated over quadrats. This introduces a kind of collinearity between water content and the disease in neighboring quadrats, making estimation and interpretation of the parameters of the auto logistic model more difficult. We discuss fitting and evaluating the autologistic model when the covariates are themselves spatially correlated

    A Practitioner's Guide to Bayesian Inference in Pharmacometrics using Pumas

    Full text link
    This paper provides a comprehensive tutorial for Bayesian practitioners in pharmacometrics using Pumas workflows. We start by giving a brief motivation of Bayesian inference for pharmacometrics highlighting limitations in existing software that Pumas addresses. We then follow by a description of all the steps of a standard Bayesian workflow for pharmacometrics using code snippets and examples. This includes: model definition, prior selection, sampling from the posterior, prior and posterior simulations and predictions, counter-factual simulations and predictions, convergence diagnostics, visual predictive checks, and finally model comparison with cross-validation. Finally, the background and intuition behind many advanced concepts in Bayesian statistics are explained in simple language. This includes many important ideas and precautions that users need to keep in mind when performing Bayesian analysis. Many of the algorithms, codes, and ideas presented in this paper are highly applicable to clinical research and statistical learning at large but we chose to focus our discussions on pharmacometrics in this paper to have a narrower scope in mind and given the nature of Pumas as a software primarily for pharmacometricians
    • ā€¦
    corecore