34 research outputs found
Variable selection under multiple imputation using the bootstrap in a prognostic study
Background: Missing data is a challenging problem in many prognostic studies. Multiple imputation
(MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed
and tested a methodology combining MI with bootstrapping techniques for studying prognostic
variable selection.
Method: In our prospective cohort study we merged data from three different randomized
controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the
outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four
methods to investigate the influence of respectively sampling and imputation variation: MI only,
bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected
based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the
variable appeared in the model. The discriminative and calibrative abilities of prognostic models
developed by the four methods were assessed at different inclusion levels.
Results: We found that the effect of imputation variation on the inclusion frequency was larger
than the effect of sampling variation. When MI and bootstrapping were combined at the range of
0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and
slope values of 0.64 to 0.86 were found.
Conclusion: We recommend to account for both imputation and sampling variation in sets of
missing data. The new procedure of combining MI with bootstrapping for variable selection, results
in multivariable prognostic models with good performance and is therefore attractive to apply on
data sets with missing values
The influence of recent social experience and physical environment on courtship and male aggression
Detection of enzyme activity at trace levels: A new perspective for the direct screening of active catalytic antibodies
International audienc
Detecting Gene-Environment Interactions Using a Combined Case-Only and Case-Control Approach
The conventional method of detecting gene-environment interactions, the case-control analysis, suffers from low statistical power. In contrast, the case-only analysis/design can be powerful in certain scenarios, although violation of the assumption of independence between the genetic and environmental factors can greatly bias the results. As an alternative, Bayes model averaging may be used to combine the case-control and case-only analyses. This approach first frames the case-control and case-only analyses as variations of a log-linear model. The weighting between these 2 models is then a function of the data and prior beliefs on the independence of the 2 potentially interacting factors. In this paper, the authors demonstrate via simulations that when there is no prior information on the independence of the genetic and environmental factors, this approach tends to be more powerful than the case-control analysis. Additionally, when the genetic and environmental factors are not independent in the population, bias is substantially reduced, with a corresponding reduction in type I error in comparison with the case-only analysis. Increased power or increased robustness to violations of the independence assumption may be obtained with more appropriate prior specification. The authors use an example data analysis to demonstrate the advantages of this approach