4,586 research outputs found

    Stable computational methods for additive binomial models with application to adjusted risk differences

    Get PDF
    Risk difference is an important measure of effect size in biostatistics, for both randomised and observational studies. The natural way to adjust risk differences for potential confounders is to use an additive binomial model, which is a binomial generalised linear model with an identity link function. However, implementations of the additive binomial model in commonly used statistical packages can fail to converge to the maximum likelihood estimate (MLE), necessitating the use of approximate methods involving misspecified or inflexible models. A novel computational method is proposed, which retains the additive binomial model but uses the multinomial–Poisson transformation to convert the problem into an equivalent additive Poisson fit. The method allows reliable computation of the MLE, as well as allowing for semi-parametric monotonic regression functions. The performance of the method is examined in simulations and it is used to analyse two datasets from clinical trials in acute myocardial infarction. Source code for implementing the method in R is provided as supplementary material (see Appendix A).Australian Research Counci

    Estimation of adjusted rate differences using additive negative binomial regression

    Get PDF
    Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this, however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the ECME algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright c 2015 John Wiley & Sons, Ltd

    GAMLSS for high-dimensional data – a flexible approach based on boosting

    Get PDF
    Generalized additive models for location, scale and shape (GAMLSS) are a popular semi-parametric modelling approach that, in contrast to conventional GAMs, regress not only the expected mean but every distribution parameter (e.g. location, scale and shape) to a set of covariates. Current fitting procedures for GAMLSS are infeasible for high-dimensional data setups and require variable selection based on (potentially problematic) information criteria. The present work describes a boosting algorithm for high-dimensional GAMLSS that was developed to overcome these limitations. Specifically, the new algorithm was designed to allow the simultaneous estimation of predictor effects and variable selection. The proposed algorithm was applied to data of the Munich Rental Guide, which is used by landlords and tenants as a reference for the average rent of a flat depending on its characteristics and spatial features. The net-rent predictions that resulted from the high-dimensional GAMLSS were found to be highly competitive while covariate-specific prediction intervals showed a major improvement over classical GAMs

    A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection.

    Get PDF
    The partial area under the receiver operating characteristic curve (PAUC) is a well-established performance measure to evaluate biomarker combinations for disease classification. Because the PAUC is defined as the area under the ROC curve within a restricted interval of false positive rates, it enables practitioners to quantify sensitivity rates within pre-specified specificity ranges. This issue is of considerable importance for the development of medical screening tests. Although many authors have highlighted the importance of PAUC, there exist only few methods that use the PAUC as an objective function for finding optimal combinations of biomarkers. In this paper, we introduce a boosting method for deriving marker combinations that is explicitly based on the PAUC criterion. The proposed method can be applied in high-dimensional settings where the number of biomarkers exceeds the number of observations. Additionally, the proposed method incorporates a recently proposed variable selection technique (stability selection) that results in sparse prediction rules incorporating only those biomarkers that make relevant contributions to predicting the outcome of interest. Using both simulated data and real data, we demonstrate that our method performs well with respect to both variable selection and prediction accuracy. Specifically, if the focus is on a limited range of specificity values, the new method results in better predictions than other established techniques for disease classification

    Standardization and Control for Confounding in Observational Studies: A Historical Perspective

    Full text link
    Control for confounders in observational studies was generally handled through stratification and standardization until the 1960s. Standardization typically reweights the stratum-specific rates so that exposure categories become comparable. With the development first of loglinear models, soon also of nonlinear regression techniques (logistic regression, failure time regression) that the emerging computers could handle, regression modelling became the preferred approach, just as was already the case with multiple regression analysis for continuous outcomes. Since the mid 1990s it has become increasingly obvious that weighting methods are still often useful, sometimes even necessary. On this background we aim at describing the emergence of the modelling approach and the refinement of the weighting approach for confounder control.Comment: Published in at http://dx.doi.org/10.1214/13-STS453 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore