42 research outputs found

    Nonlinear association structures in flexible Bayesian additive joint models

    Full text link
    Joint models of longitudinal and survival data have become an important tool for modeling associations between longitudinal biomarkers and event processes. The association between marker and log-hazard is assumed to be linear in existing shared random effects models, with this assumption usually remaining unchecked. We present an extended framework of flexible additive joint models that allows the estimation of nonlinear, covariate specific associations by making use of Bayesian P-splines. Our joint models are estimated in a Bayesian framework using structured additive predictors for all model components, allowing for great flexibility in the specification of smooth nonlinear, time-varying and random effects terms for longitudinal submodel, survival submodel and their association. The ability to capture truly linear and nonlinear associations is assessed in simulations and illustrated on the widely studied biomedical data on the rare fatal liver disease primary biliary cirrhosis. All methods are implemented in the R package bamlss to facilitate the application of this flexible joint model in practice.Comment: Changes to initial commit: minor language editing, additional information in Section 4, formatting in Supplementary Informatio

    Flexible Generation of E-Learning Exams in R: Moodle Quizzes, OLAT Assessments, and Beyond

    Full text link
    The capabilities of the package exams for automatic generation of (statistical) exams in R are extended by adding support for learning management systems: As in earlier versions of the package exam generation is still based on separate Sweave files for each exercise - but rather than just producing different types of PDF output files, the package can now render the same exercises into a wide variety of output formats. These include HTML (with various options for displaying mathematical content) and XML specifications for online exams in learning management systems such as Moodle or OLAT. This flexibility is accomplished by a new modular and extensible design of the package that allows for reading all weaved exercises into R and managing associated supplementary files (such as graphics or data files). The manuscript discusses the readily available user interfaces, the design of the underlying infrastructure, and how new functionality can be built on top of the existing tools

    Modeling House Prices using Multilevel Structured Additive Regression

    Full text link
    This paper analyzes house price data belonging to three hierarchical levels of spatial units. House selling prices with associated individual attributes (the elementary level-1) are grouped within municipalities (level-2), which form districts (level-3), which are themselves nested in counties (level-4). Additionally to individual attributes, explanatory covariates with possibly nonlinear effects are available on two of these spatial resolutions. We apply a multilevel version of structured additive regression (STAR) models to regress house prices on individual attributes and locational neighborhood characteristics in a four level hierarchical model. In multilevel STAR models the regression coefficients of a particular nonlinear term may themselves obey a regression model with structured additive predictor. The framework thus allows to incorporate nonlinear covariate effects and time trends, smooth spatial effects and complex interactions at every level of the hierarchy of the multilevel model. Moreover we are able to decompose the spatial heterogeneity effect and investigate its magnitude at different spatial resolutions allowing for improved predictive quality even in the case of unobserved spatial units. Statistical inference is fully Bayesian and based on highly efficient Markov chain Monte Carlo simulation techniques that take advantage of the hierarchical structure in the data

    BAMLSS: Bayesian additive models for location, scale and shape (and beyond)

    Get PDF
    Bayesian analysis provides a convenient setting for the estimation of complex generalized additive regression models (GAMs). Since computational power has tremendously increased in the past decade it is now possible to tackle complicated inferential problems, e.g., with Markov chain Monte Carlo simulation, on virtually any modern computer. This is one of the reasons why Bayesian methods have become increasingly popular, leading to a number of highly specialized and optimized estimation engines and with attention shifting from conditional mean models to probabilistic distributional models capturing location, scale, shape (and other aspects) of the response distribution. In order to embed many different approaches suggested in literature and software, a unified modeling architecture for distributional GAMs is established that exploits the general structure of these models and encompasses many different response distributions, estimation techniques (posterior mode or posterior mean), and model terms (fixed, random, smooth, spatial, . . . ). It is shown that within this framework implementing algorithms for complex regression problems, as well as the integration of already existing software, is relatively straightforward. The usefulness is emphasized with two complex and computationally demanding application case studies: a large daily precipitation climatology based on more than 1.2 million observations from more than 50 meteorological stations, as well as a Cox model for continuous time with space-time interactions on a data set with over five thousand "individuals"

    Comparing Penalized Splines and Fractional Polynomials for Flexible Modelling of the Effects of Continuous Predictor Variables

    Full text link
    P(enalized)-splines and fractional polynomials (FPs) have emerged as powerful smoothing techniques with increasing popularity in several fields of applied research. Both approaches provide considerable flexibility, but only limited comparative evaluations of the performance and properties of the two methods have been conducted to date. We thus performed extensive simulations to compare FPs of degree 2 (FP2) and degree 4 (FP4) and P-splines that used generalized cross validation (GCV) and restricted maximum likelihood (REML) for smoothing parameter selection. We evaluated the ability of P-splines and FPs to recover the true functional form of the association between continuous, binary and survival outcomes and exposure for linear, quadratic and more complex, non-linear functions, using different sample sizes and signal to noise ratios. We found that for more curved functions FP2, the current default implementation in standard software, showed considerably bias and consistently higher mean squared error (MSE) compared to spline-based estimators (REML, GCV) and FP4, that performed equally well in most simulation settings. FPs however, are prone to artefacts due to the specific choice of the origin, while P-splines based on GCV reveal sometimes wiggly estimates in particular for small sample sizes. Finally,we highlight the specific features of the approaches in a real dataset

    Why Does It Always Rain on Me? A Spatio-Temporal Analysis of Precipitation in Austria

    Full text link
    It is popular belief that the weather is bad more frequently on weekends than on other days of the week and this is often perceived to be associated with an increased chance of rain. In fact, the meteorological literature does report some evidence for such human-induced weekly cycles although these findings are not undisputed. To contribute to this discussion, a modern data-driven approach using structured additive regression models is applied to a newly available high-quality data set for Austria. The analysis investigates how an ordered response of rain intensities is influenced by a (potential) weekend effect while adjusting for spatio-temporal structure using spatially varying effects of overall level and seasonality patterns. The underlying data are taken from the HOMSTART project which provides daily precipitation quantities over a period of more than 60 years and a dense net of more than 50 meteorological stations all across Austria

    Structured Additive Regression Models: An R Interface to BayesX

    Full text link
    Structured additive regression (STAR) models provide a flexible framework for modeling possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models (GLM) and generalized additive models (GAM) as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo (MCMC) simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using R's formula language (with some extended terms), fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference

    Statistical risk analysis for real estate collateral valuation using Bayesian distributional and quantile regression

    Full text link
    The Basel II framework strictly defines the conditions under which financial institutions are authorized to accept real estate as collateral in order to decrease their credit risk. A widely used concept for its valuation is the hedonic approach. It assumes, that a property can be characterized by a bundle of covariates that involves both individual attributes of the building itself and locational attributes of the region where the building is located in. Each of these attributes can be assigned an implicit price, summing up to the value of the entire property. With respect to value-at-risk concepts financial institutions are often not only interested in the expected value but also in different quantiles of the distribution of real estate prices. To meet these requirements, we develop and compare multilevel structured additive regression models based on GAMLSS type approaches and quantile regression, respectively. Our models involve linear, nonlinear and spatial effects. Nonlinear effects are modeled with P-splines, spatial effects are represented by Gaussian Markov random fields. Due to the high complexity of the models statistical inference is fully Bayesian and based on highly efficient Markov chain Monte Carlo simulation techniques

    Spatio-temporal precipitation climatology over complex terrain using a censored additive regression model

    Full text link
    Flexible spatio-temporal models are widely used to create reliable and accurate estimates for precipitation climatologies. Most models are based on square root transformed monthly or annual means, where a normal distribution seems to be appropriate. This assumption becomes invalid on a daily time scale as the observations involve large fractions of zero-observations and are limited to non-negative values. We develop a novel spatio-temporal model to estimate the full climatological distribution of precipitation on a daily time scale over complex terrain using a left-censored normal distribution. The results demonstrate that the new method is able to account for the non-normal distribution and the large fraction of zero-observations. The new climatology provides the full climatological distribution on a very high spatial and temporal resolution, and is competitive with, or even outperforms existing methods, even for arbitrary locations

    Bayesian Gaussian distributional regression models for more efficient norm estimation

    Get PDF
    A test score on a psychological test is usually expressed as a normed score, representing its position relative to test scores in a reference population. These typically depend on predictor(s) such as age. The test score distribution conditional on predictors is estimated using regression, which may need large normative samples to estimate the relationships between the predictor(s) and the distribution characteristics properly. In this study, we examine to what extent this burden can be alleviated by using prior information in the estimation of new norms with Bayesian Gaussian distributional regression. In a simulation study, we investigate to what extent this norm estimation is more efficient and how robust it is to prior model deviations. We varied the prior type, prior misspecification and sample size. In our simulated conditions, using a fixed effects prior resulted in more efficient norm estimation than a weakly informative prior as long as the prior misspecification was not age dependent. With the proposed method and reasonable prior information, the same norm precision can be achieved with a smaller normative sample, at least in empirical problems similar to our simulated conditions. This may help test developers to achieve cost‐efficient high‐quality norms. The method is illustrated using empirical normative data from the IDS‐2 intelligence test
    corecore