5,401 research outputs found
Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches
Background: Missing data is classified as missing completely at random (MCAR), missing at
random (MAR) or missing not at random (MNAR). Knowing the mechanism is useful in identifying
the most appropriate analysis. The first aim was to compare different methods for identifying this
missing data mechanism to determine if they gave consistent conclusions. Secondly, to investigate
whether the reminder-response data can be utilised to help identify the missing data mechanism.
Methods: Five clinical trial datasets that employed a reminder system at follow-up were used.
Some quality of life questionnaires were initially missing, but later recovered through reminders.
Four methods of determining the missing data mechanism were applied. Two response data
scenarios were considered. Firstly, immediate data only; secondly, all observed responses
(including reminder-response).
Results: In three of five trials the hypothesis tests found evidence against the MCAR assumption.
Logistic regression suggested MAR, but was able to use the reminder-collected data to highlight
potential MNAR data in two trials.
Conclusion: The four methods were consistent in determining the missingness mechanism. One
hypothesis test was preferred as it is applicable with intermittent missingness. Some inconsistencies between the two data scenarios were found. Ignoring the reminder data could potentially give a distorted view of the missingness mechanism. Utilising reminder data allowed the possibility of MNAR to be considered.The Chief Scientist Office of the Scottish Government Health Directorate.
Research Training Fellowship (CZF/1/31
Model-based Clustering with Missing Not At Random Data
Traditional ways for handling missing values are not designed for the
clustering purpose and they rarely apply to the general case, though frequent
in practice, of Missing Not At Random (MNAR) values. This paper proposes to
embed MNAR data directly within model-based clustering algorithms. We introduce
a mixture model for different types of data (continuous, count, categorical and
mixed) to jointly model the data distribution and the MNAR mechanism. Eight
different MNAR models are proposed, which may depend on the underlying
(unknown) classes and/or the values of the missing variables themselves. We
prove the identifiability of the parameters of both the data distribution and
the mechanism, whatever the type of data and the mechanism, and propose an EM
or Stochastic EM algorithm to estimate them. The code is available on
\url{https://github.com/AudeSportisse/Clustering-MNAR}.
%\url{https://anonymous.4open.science/r/Clustering-MNAR-0201} We also prove
that MNAR models for which the missingness depends on the class membership have
the nice property that the statistical inference can be carried out on the data
matrix concatenated with the mask by considering a MAR mechanism instead.
Finally, we perform empirical evaluations for the proposed sub-models on
synthetic data and we illustrate the relevance of our method on a medical
register, the TraumaBase^{\mbox{\normalsize{\textregistered}}} dataset
Handling protest responses in contingent valuation surveys
OBJECTIVES: Protest responses, whereby respondents refuse to state the value they place on the health gain, are commonly encountered in contingent valuation (CV) studies, and they tend to be excluded from analyses. Such an approach will be biased if protesters differ from non-protesters on characteristics that predict their responses. The Heckman selection model has been commonly used to adjust for protesters, but its underlying assumptions may be implausible in this context. We present a multiple imputation (MI) approach to appropriately address protest responses in CV studies, and compare it with the Heckman selection model. METHODS: This study exploits data from the multinational EuroVaQ study, which surveyed respondents' willingness-to-pay (WTP) for a Quality Adjusted Life Year (QALY). Here, our simulation study assesses the relative performance of MI and Heckman selection models across different realistic settings grounded in the EuroVaQ study, including scenarios with different proportions of missing data and non-response mechanisms. We then illustrate the methods in the EuroVaQ study for estimating mean WTP for a QALY gain. RESULTS: We find that MI provides lower bias and mean squared error compared with the Heckman approach across all considered scenarios. The simulations suggest that the Heckman approach can lead to considerable underestimation or overestimation of mean WTP due to violations in the normality assumption, even after log-transforming the WTP responses. The case study illustrates that protesters are associated with a lower mean WTP for a QALY gain compared with non-protesters, but that the results differ according to method for handling protesters. CONCLUSIONS: MI is an appropriate method for addressing protest responses in CV studies
Robust Lasso-Zero for sparse corruption and model selection with missing covariates
We propose Robust Lasso-Zero, an extension of the Lasso-Zero methodology
[Descloux and Sardy, 2018], initially introduced for sparse linear models, to
the sparse corruptions problem. We give theoretical guarantees on the sign
recovery of the parameters for a slightly simplified version of the estimator,
called Thresholded Justice Pursuit. The use of Robust Lasso-Zero is showcased
for variable selection with missing values in the covariates. In addition to
not requiring the specification of a model for the covariates, nor estimating
their covariance matrix or the noise variance, the method has the great
advantage of handling missing not-at random values without specifying a
parametric model. Numerical experiments and a medical application underline the
relevance of Robust Lasso-Zero in such a context with few available
competitors. The method is easy to use and implemented in the R library lass0
Monte Carlo modified profile likelihood in models for clustered data
The main focus of the analysts who deal with clustered data is usually not on
the clustering variables, and hence the group-specific parameters are treated
as nuisance. If a fixed effects formulation is preferred and the total number
of clusters is large relative to the single-group sizes, classical frequentist
techniques relying on the profile likelihood are often misleading. The use of
alternative tools, such as modifications to the profile likelihood or
integrated likelihoods, for making accurate inference on a parameter of
interest can be complicated by the presence of nonstandard modelling and/or
sampling assumptions. We show here how to employ Monte Carlo simulation in
order to approximate the modified profile likelihood in some of these
unconventional frameworks. The proposed solution is widely applicable and is
shown to retain the usual properties of the modified profile likelihood. The
approach is examined in two instances particularly relevant in applications,
i.e. missing-data models and survival models with unspecified censoring
distribution. The effectiveness of the proposed solution is validated via
simulation studies and two clinical trial applications
- …