5,401 research outputs found

    Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches

    Get PDF
    Background: Missing data is classified as missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). Knowing the mechanism is useful in identifying the most appropriate analysis. The first aim was to compare different methods for identifying this missing data mechanism to determine if they gave consistent conclusions. Secondly, to investigate whether the reminder-response data can be utilised to help identify the missing data mechanism. Methods: Five clinical trial datasets that employed a reminder system at follow-up were used. Some quality of life questionnaires were initially missing, but later recovered through reminders. Four methods of determining the missing data mechanism were applied. Two response data scenarios were considered. Firstly, immediate data only; secondly, all observed responses (including reminder-response). Results: In three of five trials the hypothesis tests found evidence against the MCAR assumption. Logistic regression suggested MAR, but was able to use the reminder-collected data to highlight potential MNAR data in two trials. Conclusion: The four methods were consistent in determining the missingness mechanism. One hypothesis test was preferred as it is applicable with intermittent missingness. Some inconsistencies between the two data scenarios were found. Ignoring the reminder data could potentially give a distorted view of the missingness mechanism. Utilising reminder data allowed the possibility of MNAR to be considered.The Chief Scientist Office of the Scottish Government Health Directorate. Research Training Fellowship (CZF/1/31

    Model-based Clustering with Missing Not At Random Data

    Full text link
    Traditional ways for handling missing values are not designed for the clustering purpose and they rarely apply to the general case, though frequent in practice, of Missing Not At Random (MNAR) values. This paper proposes to embed MNAR data directly within model-based clustering algorithms. We introduce a mixture model for different types of data (continuous, count, categorical and mixed) to jointly model the data distribution and the MNAR mechanism. Eight different MNAR models are proposed, which may depend on the underlying (unknown) classes and/or the values of the missing variables themselves. We prove the identifiability of the parameters of both the data distribution and the mechanism, whatever the type of data and the mechanism, and propose an EM or Stochastic EM algorithm to estimate them. The code is available on \url{https://github.com/AudeSportisse/Clustering-MNAR}. %\url{https://anonymous.4open.science/r/Clustering-MNAR-0201} We also prove that MNAR models for which the missingness depends on the class membership have the nice property that the statistical inference can be carried out on the data matrix concatenated with the mask by considering a MAR mechanism instead. Finally, we perform empirical evaluations for the proposed sub-models on synthetic data and we illustrate the relevance of our method on a medical register, the TraumaBase^{\mbox{\normalsize{\textregistered}}} dataset

    Handling protest responses in contingent valuation surveys

    Get PDF
    OBJECTIVES: Protest responses, whereby respondents refuse to state the value they place on the health gain, are commonly encountered in contingent valuation (CV) studies, and they tend to be excluded from analyses. Such an approach will be biased if protesters differ from non-protesters on characteristics that predict their responses. The Heckman selection model has been commonly used to adjust for protesters, but its underlying assumptions may be implausible in this context. We present a multiple imputation (MI) approach to appropriately address protest responses in CV studies, and compare it with the Heckman selection model. METHODS: This study exploits data from the multinational EuroVaQ study, which surveyed respondents' willingness-to-pay (WTP) for a Quality Adjusted Life Year (QALY). Here, our simulation study assesses the relative performance of MI and Heckman selection models across different realistic settings grounded in the EuroVaQ study, including scenarios with different proportions of missing data and non-response mechanisms. We then illustrate the methods in the EuroVaQ study for estimating mean WTP for a QALY gain. RESULTS: We find that MI provides lower bias and mean squared error compared with the Heckman approach across all considered scenarios. The simulations suggest that the Heckman approach can lead to considerable underestimation or overestimation of mean WTP due to violations in the normality assumption, even after log-transforming the WTP responses. The case study illustrates that protesters are associated with a lower mean WTP for a QALY gain compared with non-protesters, but that the results differ according to method for handling protesters. CONCLUSIONS: MI is an appropriate method for addressing protest responses in CV studies

    Robust Lasso-Zero for sparse corruption and model selection with missing covariates

    Full text link
    We propose Robust Lasso-Zero, an extension of the Lasso-Zero methodology [Descloux and Sardy, 2018], initially introduced for sparse linear models, to the sparse corruptions problem. We give theoretical guarantees on the sign recovery of the parameters for a slightly simplified version of the estimator, called Thresholded Justice Pursuit. The use of Robust Lasso-Zero is showcased for variable selection with missing values in the covariates. In addition to not requiring the specification of a model for the covariates, nor estimating their covariance matrix or the noise variance, the method has the great advantage of handling missing not-at random values without specifying a parametric model. Numerical experiments and a medical application underline the relevance of Robust Lasso-Zero in such a context with few available competitors. The method is easy to use and implemented in the R library lass0

    Monte Carlo modified profile likelihood in models for clustered data

    Get PDF
    The main focus of the analysts who deal with clustered data is usually not on the clustering variables, and hence the group-specific parameters are treated as nuisance. If a fixed effects formulation is preferred and the total number of clusters is large relative to the single-group sizes, classical frequentist techniques relying on the profile likelihood are often misleading. The use of alternative tools, such as modifications to the profile likelihood or integrated likelihoods, for making accurate inference on a parameter of interest can be complicated by the presence of nonstandard modelling and/or sampling assumptions. We show here how to employ Monte Carlo simulation in order to approximate the modified profile likelihood in some of these unconventional frameworks. The proposed solution is widely applicable and is shown to retain the usual properties of the modified profile likelihood. The approach is examined in two instances particularly relevant in applications, i.e. missing-data models and survival models with unspecified censoring distribution. The effectiveness of the proposed solution is validated via simulation studies and two clinical trial applications
    corecore