339,212 research outputs found
Testing Missing At Random Using Instrumental Variables
This paper proposes a test for missing at random (MAR). The MAR assumption is shown to be testable given instrumental variables which are independent of response given potential outcomes. A nonparametric testing procedure based on integrated squared distance is proposed. The statistic's asymptotic distribution under the MAR hypothesis is derived. In particular, our results can be applied to testing missing completely at random (MCAR). A Monte Carlo study examines finite sample performance of our test statistic. An empirical illustration analyzes the nonresponse mechanism in labor income questions
Missing not at random in end of life care studies : multiple imputation and sensitivity analysis on data from the ACTION study
Background: Missing data are common in end-of-life care studies, but there is still relatively little exploration of which is the best method to deal with them, and, in particular, if the missing at random (MAR) assumption is valid or missing not at random (MNAR) mechanisms should be assumed. In this paper we investigated this issue through a sensitivity analysis within the ACTION study, a multicenter cluster randomized controlled trial testing advance care planning in patients with advanced lung or colorectal cancer.
Methods: Multiple imputation procedures under MAR and MNAR assumptions were implemented. Possible violation of the MAR assumption was addressed with reference to variables measuring quality of life and symptoms. The MNAR model assumed that patients with worse health were more likely to have missing questionnaires, making a distinction between single missing items, which were assumed to satisfy the MAR assumption, and missing values due to completely missing questionnaire for which a MNAR mechanism was hypothesized. We explored the sensitivity to possible departures from MAR on gender differences between key indicators and on simple correlations.
Results: Up to 39% of follow-up data were missing. Results under MAR reflected that missingness was related to poorer health status. Correlations between variables, although very small, changed according to the imputation method, as well as the differences in scores by gender, indicating a certain sensitivity of the results to the violation of the MAR assumption.
Conclusions: The findings confirmed the importance of undertaking this kind of analysis in end-of-life care studies
Recommended from our members
Impact of Violation of the Missing-at-Random Assumption on Full-Information Maximum Likelihood Method in Multidimensional Adaptive Testing
The full-information maximum likelihood (FIML) method makes it possible to estimate and analyze structural equation models (SEM) even when data are partially missing, enabling incomplete data to contribute to model estimation. The cornerstone of FIML is the missing-at-random (MAR) assumption. In (unidimensional) computerized adaptive testing (CAT), unselected items (i.e., responses that are not observed) remain at random even though selected items (i.e., responses that are observed) have been associated with a test taker’s latent trait that is being measured. In multidimensional adaptive testing (MAT), however, the missingness in the response data partially depends on the unobserved data because items are selected based on various types of information including the covariance among latent traits. This eventually may lead to violations of MAR. This study aimed to evaluate the potential impact such a violation of MAR in MAT could have on FIML estimation performance. The results showed an increase in estimation errors in item parameter estimation when the MAT response data were used, and differences in the level of the impact depending on how items loaded on multiple latent traits. Accessed 4,728 times on https://pareonline.net from May 19, 2014 to December 31, 2019. For downloads from January 1, 2020 forward, please click on the PlumX Metrics link to the right
Missing not at random in end of life care studies: multiple imputation and sensitivity analysis on data from the ACTION study
Background: Missing data are common in end-of-life care studies, but there is still relatively little exploration of which is the best method to deal with them, and, in particular, if the missing at random (MAR) assumption is valid or missing not at random (MNAR) mechanisms should be assumed. In this paper we investigated this issue through a sensitivity analysis within the ACTION study, a multicenter cluster randomized controlled trial testing advance care planning in patients with advanced lung or colorectal cancer. Methods: Multiple imputation procedures under MAR and MNAR assumptions were implemented. Possible violation of the MAR assumption was addressed with reference to variables measuring quality of life and symptoms. The MNAR model assumed that patients with worse health were more likely to have missing questionnaires, making a distinction between single missing items, which were assumed to satisfy the MAR assumption, and missing values due to completely missing questionnaire for which a MNAR mechanism was hypothesized. We explored the sensitivity to possible departures from MAR on gender differences between
Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy
The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient’s and therapist’s version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy
Missing not at random in end of life care studies: multiple imputation and sensitivity analysis on data from the ACTION study
Background: Missing data are common in end-of-life care studies, but there is still relatively little exploration of which is the best method to deal with them, and, in particular, if the missing at random (MAR) assumption is valid or missing not at random (MNAR) mechanisms should be assumed. In this paper we investigated this issue through a sensitivity analysis within the ACTION study, a multicenter cluster randomized controlled trial testing advance care planning in patients with advanced lung or colorectal cancer. Methods: Multiple imputation procedures under MAR and MNAR assumptions were implemented. Possible violation of the MAR assumption was addressed with reference to variables measuring quality of life and symptoms. The MNAR model assumed that patients with worse health were more likely to have missing questionnaires, making a distinction between single missing items, which were assumed to satisfy the MAR assumption, and missing values due to completely missing questionnaire for which a MNAR mechanism was hypothesized. We explored the sensitivity to possible departures from MAR on gender differences between key indicators and on simple correlations. Results: Up to 39% of follow-up data were missing. Results under MAR reflected that missingness was related to poorer health status. Correlations between variables, although very small, changed according to the imputation method, as well as the differences in scores by gender, indicating a certain sensitivity of the results to the violation of the MAR assumption. Conclusions: The findings confirmed the importance of undertaking this kind of analysis in end-of-life care studies
Recommended from our members
Effective techniques for handling incomplete data using decision trees
Decision Trees (DTs) have been recognized as one of the most successful formalisms for knowledge representation and reasoning and are currently applied to a variety of data mining or knowledge discovery applications, particularly for classification problems. There are several efficient methods to learn a DT from data. However, these methods are often limited to the assumption that data are complete.
In this thesis, some contributions to the field of machine learning and statistics that solve the problem of extracting DTs for learning and classification tasks from incomplete databases are presented. The methodology underlying the thesis blends together well-established statistical theories with the most advanced techniques for machine learning and automated reasoning with uncertainty.
The first contribution is the extensive simulations which study the impact of missing data on predictive accuracy of existing DTs which can cope with missing values, when missing values are in both the training and test sets or when they are in either of the two sets. All simulations are performed under missing completely at random, missing at random and informatively missing mechanisms and for different missing data patterns and proportions.
The proposal of a simple, novel, yet effective proposed procedure for training and testing using decision trees in the presence of missing data is the next contribution. Original and simple splitting criteria for attribute selection in tree building are put forward. The proposed technique is evaluated and validated in empirical tests over many real world application domains. In this work, the proposed algorithm maintains (sometimes exceeds) the outstanding accuracy of multiple imputation, especially on datasets containing mixed attributes and purely nominal attributes. Also, the proposed algorithm greatly improves in accuracy for IM data. Another major advantage of this method over multiple imputation is the important saving in computational resources due to it simplicity.
The next contribution is the proposal of three versions of simple probabilistic techniques that could be used for classifying incomplete vectors using decision trees based on complete data. The proposed procedure is superficially similar to that of fractional cases but more effective. The experimental results demonstrate that these approaches can achieve comparative quality to sophisticated algorithms like multiple imputation and therefore are applicable to all kinds of datasets.
Finally, novel uses of two proposed ensemble procedures for handling incomplete training and test data are proposed and discussed. The algorithms combine the two best approaches either with resampling (REMIMIA) or without resampling (EMIMIA) of the training data before growing the decision trees. Experiments are used to evaluate and validate the success of the proposed ensemble methods with respect to individual missing data techniques in the form of empirical tests. EMIMIA attains the highest overall level of prediction accuracy
Uncertainty-driven refinement of tumor-core segmentation using 3D-to-2D networks with label uncertainty
The BraTS dataset contains a mixture of high-grade and low-grade gliomas,
which have a rather different appearance: previous studies have shown that
performance can be improved by separated training on low-grade gliomas (LGGs)
and high-grade gliomas (HGGs), but in practice this information is not
available at test time to decide which model to use. By contrast with HGGs,
LGGs often present no sharp boundary between the tumor core and the surrounding
edema, but rather a gradual reduction of tumor-cell density.
Utilizing our 3D-to-2D fully convolutional architecture, DeepSCAN, which
ranked highly in the 2019 BraTS challenge and was trained using an
uncertainty-aware loss, we separate cases into those with a confidently
segmented core, and those with a vaguely segmented or missing core. Since by
assumption every tumor has a core, we reduce the threshold for classification
of core tissue in those cases where the core, as segmented by the classifier,
is vaguely defined or missing.
We then predict survival of high-grade glioma patients using a fusion of
linear regression and random forest classification, based on age, number of
distinct tumor components, and number of distinct tumor cores.
We present results on the validation dataset of the Multimodal Brain Tumor
Segmentation Challenge 2020 (segmentation and uncertainty challenge), and on
the testing set, where the method achieved 4th place in Segmentation, 1st place
in uncertainty estimation, and 1st place in Survival prediction.Comment: Presented (virtually) in the MICCAI Brainles workshop 2020. Accepted
for publication in Brainles proceeding
- …