43 research outputs found
Global, Parameterwise and Joint Shrinkage Factor Estimation
The predictive value of a statistical model can often be improved by applying shrinkage methods. This can be achieved, e.g., by regularized regression or empirical Bayes approaches. Various types of shrinkage factors can also be estimated after a maximum likelihood fit has been obtained: while global shrinkage modifies all regression coefficients by the same factor, parameterwise shrinkage factors differ between regression coefficients. The latter ones have been proposed especially in the context of variable selection. With variables which are either highly correlated or associated with regard to contents, such as dummy variables coding a categorical variable, or several parameters describing a nonlinear effect, parameterwise shrinkage factors may not be the best choice. For such cases, we extend the present methodology by so-called 'joint shrinkage factors', a compromise between global and parameterwise shrinkage. Shrinkage factors are often estimated using leave-one-out resampling. We also discuss a computationally simple and much faster approximation to resampling-based shrinkage factor estimation, can be easily obtained in most standard software packages for regression analyses. This alternative may be relevant for simulation studies and other computerintensive investigations. Furthermore, we provide an R package shrink implementing the mentioned shrinkage methods for models fitted by linear, generalized linear, or Cox regression, even if these models involve fractional polynomials or restricted cubic splines to estimate the influence of a continuous variable by a nonlinear function. The approaches and usage of the package shrink are illustrated by means of two examples
Evaluating methods for Lasso selective inference in biomedical research by a comparative simulation study
Variable selection for regression models plays a key role in the analysis of
biomedical data. However, inference after selection is not covered by classical
statistical frequentist theory which assumes a fixed set of covariates in the
model. We review two interpretations of inference after selection: the full
model view, in which the parameters of interest are those of the full model on
all predictors, and then focus on the submodel view, in which the parameters of
interest are those of the selected model only. In the context of L1-penalized
regression we compare proposals for submodel inference (selective inference)
via confidence intervals available to applied researchers via software packages
using a simulation study inspired by real data commonly seen in biomedical
studies. Furthermore, we present an exemplary application of these methods to a
publicly available dataset to discuss their practical usability. Our findings
indicate that the frequentist properties of selective confidence intervals are
generally acceptable, but desired coverage levels are not guaranteed in all
scenarios except for the most conservative methods. The choice of inference
method potentially has a large impact on the resulting interval estimates,
thereby necessitating that the user is acutely aware of the goal of inference
in order to interpret and communicate the results. Currently available software
packages are not yet very user friendly or robust which might affect their use
in practice. In summary, we find submodel inference after selection useful for
experienced statisticians to assess the importance of individual selected
predictors in future applications.Comment: 42 pages, 19 figures, 7 table
Weighted Cox Regression Using the R Package coxphw
Cox's regression model for the analysis of survival data relies on the proportional hazards assumption. However, this assumption is often violated in practice and as a consequence the average relative risk may be under- or overestimated. Weighted estimation of Cox regression is a parsimonious alternative which supplies well interpretable average effects also in case of non-proportional hazards. We provide the R package coxphw implementing weighted Cox regression. By means of two biomedical examples appropriate analyses in the presence of non-proportional hazards are exemplified and advantages of weighted Cox regression are discussed. Moreover, using package coxphw, time-dependent effects can be conveniently estimated by including interactions of covariates with arbitrary functions of time
Population-Attributable Fractions of Modifiable Lifestyle Factors for CKD and Mortality in Individuals With Type 2 Diabetes: A Cohort Study
BackgroundWe quantified the impact of lifestyle and dietary modifications on chronic kidney disease (CKD) by estimating population-attributable fractions (PAFs).Study DesignObservational cohort study.Setting & ParticipantsMiddle-aged adults with type 2 diabetes but without severe albuminuria from the Ongoing Telmisartan Alone and in Combination With Ramipril Global Endpoint Trial (ONTARGET; n=6,916).FactorsModifiable lifestyle/dietary risk factors, such as physical activity, size of social network, alcohol intake, tobacco use, diet, and intake of various food items.OutcomesThe primary outcome was CKD, ascertained as moderate to severe albuminuria or ≥5% annual decline in estimated glomerular filtration rate (eGFR) after 5.5 years. The competing risk for death was considered. PAF was defined as the proportional reduction in CKD or mortality (within 5.5 years) that would occur if exposure to a risk factor was changed to an optimal level.ResultsAt baseline, median urinary albumin-creatinine ratio and eGFR were 6.6 (IQR, 2.9-25.0) mg/mmol and 71.5 (IQR, 58.1-85.9) mL/min/1.73m2, respectively. After 5.5 years, 704 (32.5%) participants developed albuminuria, 1,194 (55.2%) had a ≥5% annual eGFR decline, 267 (12.3%) had both, and 1,022 (14.8%) had died. Being physically active every day has PAFs of 5.1% (95% CI, 0.5%-9.6%) for CKD and 12.3% (95% CI, 4.9%-19.1%) for death. Among food items, increasing vegetable intake would have the largest impact on population health. Considering diet, weight, physical activity, tobacco use, and size of social network, exposure to less than optimum levels gives PAFs of 13.3% (95% CI, 5.5%-20.9%) for CKD and 37.5% (95% CI, 27.8%-46.7%) for death. For the 17.8 million middle-aged Americans with diabetes, improving 1 of these lifestyle behaviors to the optimal range could reduce the incidence or progression of CKD after 5.5 years by 274,000 and the number of deaths within 5.5 years by 405,000.LimitationsAscertainment of changes in kidney measures does not precisely match the definitions for incidence or progression of CKD.ConclusionsHealthy lifestyle and diet are associated with less CKD and mortality and may have a substantial impact on population kidney health
Retrospective, multicenter analysis comparing conventional with oncoplastic breast conserving surgery: oncological and surgical outcomes in women with high-risk breast cancer from the OPBC-01/iTOP2 study
Introduction:
Recent data suggest that margins ≥2 mm after breast-conserving surgery may improve local control in invasive breast cancer (BC). By allowing large resection volumes, oncoplastic breast-conserving surgery (OBCII; Clough level II/Tübingen 5-6) may achieve better local control than conventional breast conserving surgery (BCS; Tübingen 1-2) or oncoplastic breast conservation with low resection volumes (OBCI; Clough level I/Tübingen 3-4).
Methods:
Data from consecutive high-risk BC patients treated in 15 centers from the Oncoplastic Breast Consortium (OPBC) network, between January 2010 and December 2013, were retrospectively reviewed.
Results:
A total of 3,177 women were included, 30% of whom were treated with OBC (OBCI n = 663; OBCII n = 297). The BCS/OBCI group had significantly smaller tumors and smaller resection margins compared with OBCII (pT1: 50% vs. 37%, p = 0.002; proportion with margin <1 mm: 17% vs. 6%, p < 0.001). There were significantly more re-excisions due to R1 (“ink on tumor”) in the BCS/OBCI compared with the OBCII group (11% vs. 7%, p = 0.049). Univariate and multivariable regression analysis adjusted for tumor biology, tumor size, radiotherapy, and systemic treatment demonstrated no differences in local, regional, or distant recurrence-free or overall survival between the two groups.
Conclusions:
Large resection volumes in oncoplastic surgery increases the distance from cancer cells to the margin of the specimen and reduces reexcision rates significantly. With OBCII larger tumors are resected with similar local, regional and distant recurrence-free as well as overall survival rates as BCS/OBCI
Global, Parameterwise and Joint Shrinkage Factor Estimation
The predictive value of a statistical model can often be improved by applying shrinkage methods. This can be achieved, e.g., by regularized regression or empirical Bayes approaches. Various types of shrinkage factors can also be estimated after a maximum likelihood fit has been obtained: while global shrinkage modifies all regression coefficients by the same factor, parameterwise shrinkage factors differ between regression coefficients. The latter ones have been proposed especially in the context of variable selection. With variables which are either highly correlated or associated with regard to contents, such as dummy variables coding a categorical variable, or several parameters describing a nonlinear effect, parameterwise shrinkage factors may not be the best choice. For such cases, we extend the present methodology by so-called 'joint shrinkage factors', a compromise between global and parameterwise shrinkage. Shrinkage factors are often estimated using leave-one-out resampling. We also discuss a computationally simple and much faster approximation to resampling-based shrinkage factor estimation, can be easily obtained in most standard software packages for regression analyses. This alternative may be relevant for simulation studies and other computerintensive investigations. Furthermore, we provide an R package shrink implementing the mentioned shrinkage methods for models fitted by linear, generalized linear, or Cox regression, even if these models involve fractional polynomials or restricted cubic splines to estimate the influence of a continuous variable by a nonlinear function. The approaches and usage of the package shrink are illustrated by means of two examples
Regression with Highly Correlated Predictors: Variable Omission Is Not the Solution
Regression models have been in use for decades to explore and quantify the association between a dependent response and several independent variables in environmental sciences, epidemiology and public health. However, researchers often encounter situations in which some independent variables exhibit high bivariate correlation, or may even be collinear. Improper statistical handling of this situation will most certainly generate models of little or no practical use and misleading interpretations. By means of two example studies, we demonstrate how diagnostic tools for collinearity or near-collinearity may fail in guiding the analyst. Instead, the most appropriate way of handling collinearity should be driven by the research question at hand and, in particular, by the distinction between predictive or explanatory aims
Augmented backward elimination: a pragmatic and purposeful way to develop statistical models.
Statistical models are simple mathematical rules derived from empirical data describing the association between an outcome and several explanatory variables. In a typical modeling situation statistical analysis often involves a large number of potential explanatory variables and frequently only partial subject-matter knowledge is available. Therefore, selecting the most suitable variables for a model in an objective and practical manner is usually a non-trivial task. We briefly revisit the purposeful variable selection procedure suggested by Hosmer and Lemeshow which combines significance and change-in-estimate criteria for variable selection and critically discuss the change-in-estimate criterion. We show that using a significance-based threshold for the change-in-estimate criterion reduces to a simple significance-based selection of variables, as if the change-in-estimate criterion is not considered at all. Various extensions to the purposeful variable selection procedure are suggested. We propose to use backward elimination augmented with a standardized change-in-estimate criterion on the quantity of interest usually reported and interpreted in a model for variable selection. Augmented backward elimination has been implemented in a SAS macro for linear, logistic and Cox proportional hazards regression. The algorithm and its implementation were evaluated by means of a simulation study. Augmented backward elimination tends to select larger models than backward elimination and approximates the unselected model up to negligible differences in point estimates of the regression coefficients. On average, regression coefficients obtained after applying augmented backward elimination were less biased relative to the coefficients of correctly specified models than after backward elimination. In summary, we propose augmented backward elimination as a reproducible variable selection algorithm that gives the analyst more flexibility in adopting model selection to a specific statistical modeling situation
Selection of variables for multivariable models: opportunities and limitations in quantifying model stability by resampling
Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection is rarely certain. However, these issues are often ignored and model stability is not questioned.
Several resampling‐based measures were proposed to describe model stability, including variable inclusion frequencies (VIFs), model selection frequencies, relative conditional bias (RCB), and root mean squared difference ratio (RMSDR). The latter two were recently proposed to assess bias and variance inflation induced by variable selection. Here, we study the consistency and accuracy of resampling estimates of these measures and the optimal choice of the resampling technique. In particular, we compare subsampling and bootstrapping for assessing stability of linear, logistic, and Cox models obtained by backward elimination in a simulation study. Moreover, we exemplify the estimation and interpretation of all suggested measures in a study on cardiovascular risk. The VIF and the model selection frequency are only consistently estimated in the subsampling approach. By contrast, the bootstrap is advantageous in terms of bias and precision for estimating the RCB as well as the RMSDR. Though, unbiased estimation of the latter quantity requires independence of covariates, which is rarely encountered in practice. Our study stresses the importance of addressing model stability after variable selection and shows how to cope with it