31 research outputs found
Evaluation of a weighting approach for performing sensitivity analysis after multiple imputation.
BACKGROUND: Multiple imputation (MI) is a well-recognised statistical technique for handling missing data. As usually implemented in standard statistical software, MI assumes that data are 'Missing at random' (MAR); an assumption that in many settings is implausible. It is not possible to distinguish whether data are MAR or 'Missing not at random' (MNAR) using the observed data, so it is desirable to discover the impact of departures from the MAR assumption on the MI results by conducting sensitivity analyses. A weighting approach based on a selection model has been proposed for performing MNAR analyses to assess the robustness of results obtained under standard MI to departures from MAR. METHODS: In this article, we use simulation to evaluate the weighting approach as a method for exploring possible departures from MAR, with missingness in a single variable, where the parameters of interest are the marginal mean (and probability) of a partially observed outcome variable and a measure of association between the outcome and a fully observed exposure. The simulation studies compare the weighting-based MNAR estimates for various numbers of imputations in small and large samples, for moderate to large magnitudes of departure from MAR, where the degree of departure from MAR was assumed known. Further, we evaluated a proposed graphical method, which uses the dataset with missing data, for obtaining a plausible range of values for the parameter that quantifies the magnitude of departure from MAR. RESULTS: Our simulation studies confirm that the weighting approach outperformed the MAR approach, but it still suffered from bias. In particular, our findings demonstrate that the weighting approach provides biased parameter estimates, even when a large number of imputations is performed. In the examples presented, the graphical approach for selecting a range of values for the possible departures from MAR did not capture the true parameter value of departure used in generating the data. CONCLUSIONS: Overall, the weighting approach is not recommended for sensitivity analyses following MI, and further research is required to develop more appropriate methods to perform such sensitivity analyses
Evaluation of a weighting approach for performing sensitivity analysis after multiple imputation
Abstract
Background
Multiple imputation (MI) is a well-recognised statistical technique for handling missing data. As usually implemented in standard statistical software, MI assumes that data are ‘Missing at random’ (MAR); an assumption that in many settings is implausible. It is not possible to distinguish whether data are MAR or ‘Missing not at random’ (MNAR) using the observed data, so it is desirable to discover the impact of departures from the MAR assumption on the MI results by conducting sensitivity analyses. A weighting approach based on a selection model has been proposed for performing MNAR analyses to assess the robustness of results obtained under standard MI to departures from MAR.
Methods
In this article, we use simulation to evaluate the weighting approach as a method for exploring possible departures from MAR, with missingness in a single variable, where the parameters of interest are the marginal mean (and probability) of a partially observed outcome variable and a measure of association between the outcome and a fully observed exposure. The simulation studies compare the weighting-based MNAR estimates for various numbers of imputations in small and large samples, for moderate to large magnitudes of departure from MAR, where the degree of departure from MAR was assumed known. Further, we evaluated a proposed graphical method, which uses the dataset with missing data, for obtaining a plausible range of values for the parameter that quantifies the magnitude of departure from MAR.
Results
Our simulation studies confirm that the weighting approach outperformed the MAR approach, but it still suffered from bias. In particular, our findings demonstrate that the weighting approach provides biased parameter estimates, even when a large number of imputations is performed. In the examples presented, the graphical approach for selecting a range of values for the possible departures from MAR did not capture the true parameter value of departure used in generating the data.
Conclusions
Overall, the weighting approach is not recommended for sensitivity analyses following MI, and further research is required to develop more appropriate methods to perform such sensitivity analyses
Practical approaches to sensitivity analyses within the multiple imputation framework
© 2016 Dr. Panteha Hayati RezvanBackground: Missing data commonly occur in medical research, in particular, in longitudinal cohort studies with multiple waves of data collection over long periods of follow-up. A variety of approaches have been developed in the statistical literature in order to provide valid inferences in the presence of missing data. One of the widely used methods for handling missing data is a complete case analysis, in which participants with any missing observations are omitted from the statistical analysis. This results in loss of precision and statistical power, and more importantly, may produce biased estimates when participants with missing data are systematically different from those with observed data.
An alternative statistical approach for dealing with missing data is multiple imputation (MI), a flexible and sophisticated method, which has gained widespread acceptance among researchers in recent years. Many academic journals now emphasise the importance of reporting adequate information regarding missing data and request researchers to perform MI for handling missing data.
Under standard applications, MI assumes that missing data depend only on observed values, in which case the missing data are missing at random (MAR). However, missing data are often considered to be missing not at random (MNAR), as it is more likely that the probability of data being missing depends on the unobserved values. It is not possible to verify whether the missing data are MAR or MNAR; therefore, sensitivity analyses within the MI framework have been proposed for assessing the sensitivity of the findings of interest to plausible departures from MAR.
Two approaches for conducting sensitivity analyses within the MI framework have been proposed, the weighting approach(a selection-based model method) and the pattern-mixture method. Both of these methods require specification of a sensitivity parameter(s) that captures the degree of departure from MAR and frame the MNAR assumption. However, these sensitivity parameters are often unidentifiable values and cannot be estimated from the observed data since that requires the missing data to be known. The only principled approach to determine plausible values of the sensitivity parameters is to elicit them from content experts.My PhD research focuses on implementation and evaluation of the practical approaches for performing sensitivity analyses within the MI framework to investigate the impact of departures from MAR.
Methods: The literature was reviewed to determine how MI and sensitivity analyses following MI were implemented and reported in a selection of medical journals, the Lancet and the New England Journal of Medicine.
The current practical approaches for performing sensitivity analyses within the MI framework were implemented and evaluated to account for plausible departures from MAR. In particular, a series of extensive simulation experiments were performed to evaluate the weighting approach and pattern-mixture method, where missing data were MNAR in a single variable. The aim was to assess whether the methods provide unbiased estimates in small and large samples as the number of imputations increases for moderate to large magnitudes of departure from MAR, where these magnitudes were assumed to be known.Further, a graphical method proposed in the statistical literature for obtaining a plausible range of values for the parameter that quantifies the degree of departure from MAR (i.e. the sensitivity parameter) for the weighting approach was evaluated in various scenarios using simulation experiments.
The simulation experiments with a single incomplete variable were then extended to a scenario where missing data were MNAR in two variables to evaluate the performance of the pattern-mixture method.
An illustration of the application of the pattern-mixture method for performing sensitivity analyses following MI was provided using data from the Longitudinal Study of Australian Children (LSAC), where the epidemiological question of interest was to estimate the association between exposure to maternal emotional distress at age 4-5 years and total (social, emotional and behavioural) difficulties at age 8-9 years (i.e. SDQ total score). The missing data in this case study were handled using the standard MI procedure under the assumption of MAR followed by sensitivity analyses under MNAR using the pattern-mixture method, where the prior distributions for the sensitivity parameters of interest were obtained from content experts. The experts’ distributions were pooled into a single probability distribution for each of the sensitivity parameters, and then different percentile values of the pooled distributions (i.e. 5th, 25th, 50th, 75th, and 95th percentile) were used as an offset in the pattern-mixture method to investigate the effects of plausible departures from MAR.
Results: The systematic review showed that the application of MI has increased in articles published in two high-ranking medical journals over the 5-year period between 2008 and 2013. Despite the presence of guidelines and recommendations for clear documentation around the reporting of missing data and statistical analyses using MI, the review highlighted the lack of adherence to the available guidelines.Importantly, the review showed that the majority of the articles failed to recognise the importance of conducting sensitivity analyses to explore the effects of departures from MAR after implementing MI.
For the evaluation of the weighting approach, the weighting-based MNAR estimates were compared for various numbers of imputations in small and large samples, and for moderate to large magnitudes of departure from MAR. The simulation results illustrated that the MNAR estimates were biased and did not converge towards the true values of the parameters of interest,even when the number of imputations used was large (up to 1000 imputations), for either the marginal mean (and proportion) or the linear (and logistic) regression coefficient.In addition, examining the graphical method for obtaining a range of values for the plausible departures from MAR showed that the method did not capture the true parameter values used in the data generating mechanisms.
On the other hand, evaluating the pattern-mixture method, in the presence of MNAR missing data in a single variable, illustrated more promising results with unbiased and consistent MNAR estimates across different numbers of imputations and sample sizes using different values for plausible departures from MAR.Further, the findings of the simulation studies with two incomplete variables showed that the pattern-mixture method recovered unbiased estimates of the parameters of interest with known MNAR sensitivity parameters under the independence assumption of the missingness indicators for the two incomplete variables.
For the LSAC case study, the elicited distribution for the average change in SDQ total score at 8-9 years between non-respondents and respondents suggested that children with higher SDQ total difficulties score were more likely to be non-respondents. In addition, the distribution obtained for the shift in the proportion of mothers’ emotional distress at 4-5 years between non-respondents and respondents suggested the tendency for mothers who were emotionally distressed to be non-respondents. According to the sensitivity analysis results, greater percentile values of the pooled distributions for the sensitivity parameters indicated a greater departure from MAR and therefore, resulted in a greater shift in the MNAR parameter estimate. Overall, there was evidence that maternal emotional distress at 4-5 years was associated with higher levels of SDQ total score at 8-9 years using different percentile values of the pooled distributions. Based on the findings, the final conclusion obtained from MI under MAR was relatively insensitive (robust) to the elicited departures proposed by the content experts.
Conclusions: Researchers need to document adequate information regarding missing data as well as detailed descriptions of MI and any sensitivity analyses performed following MI. Detailed reporting enables readers to assess the quality of the study, appropriateness of the application of MI for handling missing data and hence, the validity of the findings.
From the results of the simulation experiments, while the weighting approach is not recommended for performing sensitivity analyses to assess the effects of departure from the MAR assumption, the pattern-mixture method seem to be a practical approach for performing sensitivity analyses within the MI framework under the independence assumption of missingness indicators for missing data in multiple variables.Further investigation of the evaluation and implementation of the pattern-mixture method is required in the presence of multiple incomplete variables with MNAR missing data when this assumption of independence does not hold
Sensitivity analysis within multiple imputation framework using delta-adjustment: Application to Longitudinal Study of Australian Children
Multiple imputation (MI) is a powerful statistical method for handling missing data. Standard implementations of MI are valid under the unverifiable assumption of missing at random (MAR), which is often implausible in practice. The delta-adjustment method, implemented within the MI framework, can be used to perform sensitivity analyses that assess the impact of departures from the MAR assumption on the final inference. This method requires specification of unknown sensitivity parameter(s) (termed as delta(s)).We illustrate the application of the delta-adjustment method using data from the Longitudinal Study of Australian Children, where the epidemiological question is to estimate the association between exposure to maternal emotional distress at age 4–5 years and total (social, emotional, and behavioural) difficulties at age 8–9 years. We elicited the sensitivity parameters for the outcome (????????) and exposure (????????) variables from a panel of experts. The elicited quantile judgements from each expert were converted into a suitable parametric probability distribution and combined using the linear pooling method. We then applied MI under MAR followed by sensitivity analyses under missing not at random (MNAR) using the delta-adjustment method. We present results from sensitivity analyses that used different percentile values of the pooled distributions for the delta parameters for ???????? and ????????, and demonstrate that twofold increases in the magnitude of the association between maternal distress and total difficulties are only observed for large departures from MAR
Recommended from our members
How to Apply Variable Selection Machine Learning Algorithms With Multiply Imputed Data: A Missing Discussion
Psychological researchers often use standard linear regression to identify relevant predictors of an outcome of interest, but challenges emerge with incomplete data and growing numbers of candidate predictors. Regularization methods like the LASSO can reduce the risk of overfitting, increase model interpretability, and improve prediction in future samples; however, handling missing data when using regularization-based variable selection methods is complicated. Using listwise deletion or an ad hoc imputation strategy to deal with missing data when using regularization methods can lead to loss of precision, substantial bias, and a reduction in predictive ability. In this tutorial, we describe three approaches for fitting a LASSO when using multiple imputation to handle missing data and illustrate how to implement these approaches in practice with an applied example. We discuss implications of each approach and describe additional research that would help solidify recommendations for best practices. (PsycInfo Database Record (c) 2023 APA, all rights reserved)
Recommended from our members
Assessing alternative imputation strategies for infrequently missing items on multi-item scales
Health-science researchers often measure psychological constructs using multi-item scales and encounter missing items on some participants. Multiple imputation (MI) has emerged as an alternative to ad-hoc methods (e.g., mean substitution) for handling incomplete data on multi-item scales, appealingly reflecting available information while accounting for uncertainty due to missing values in a unified inferential framework. However, MI can be implemented in a variety of ways. When the number of variables to impute gets large, some strategies yield unstable estimates of quantities of interest while others are not technically feasible to implement. These considerations raise pragmatic questions about the extent to which ad-hoc procedures would yield statistical properties that are competitive with theoretically motivated methods. Drawing on an HIV study where depression and anxiety symptoms are measured with multi-item scales, this empirical investigation contrasts ad-hoc methods for handling missing items with various MI implementations that differ as to whether imputation is at the item-level or scale-level and how auxiliary variables are incorporated. While the findings are consistent with previous reports favoring item-level imputation when feasible to implement, we found only subtle differences in statistical properties across procedures, suggesting that weaknesses of ad-hoc procedures may be muted when missing data percentages are modest
Русская революцiя : (ея начало, арестъ Царя, перспективы) : впечатлѣнiя и мысли очевидца и участника / А. А. Бубликов
Procedure for performing a simulation study for a normally distributed outcome. (DOCX 31 kb