6 research outputs found
Estimation/Imputation strategies for missing data in survival analysis
International audienceWe consider the problem of estimation from right-censored data, when the censoring indicator is possibly missing. We compare different estimatio/imputation strategies for recovering nuisance functional parameters. More precisely, we propose either a parametric strategy following a logistic model standard or a pure nonparametric regression strategy. We provide theoretical properties and numerical comparisons for these procedures
On the study of the Beran estimator for generalized censoring indicators
Along with the analysis of time-to-event data, it is common to assume that
only partial information is given at hand. In the presence of right-censored
data with covariates, the conditional Kaplan-Meier estimator (also referred as
the Beran estimator) is known to propose a consistent estimate for the
lifetimes conditional survival function. However, a necessary condition is the
clear knowledge of whether each individual is censored or not, although, this
information might be incomplete or even totally absent in practice. We thus
propose a study on the Beran estimator when the censoring indicator is not
clearly specified. From this, we provide a new estimator for the conditional
survival function and establish its asymptotic normality under mild conditions.
We further study the supervised learning problem where the conditional survival
function is to be predicted with no censorship indicators. To this aim, we
investigate various approaches estimating the conditional expectation for the
censoring indicator. Along with the theoretical results, we illustrate how the
estimators work for small samples by means of a simulation study and show their
practical applicability with the analysis of synthetic data and the study of
real data for the prognosis of monoclonal gammopathy
Confidence bands for survival functions under semiparametric random censorship models
In medical reports point estimates and pointwise confidence intervals of parameters are usually displayed. When the parameter is a survival function, however, the approach of joining the upper end points of individual interval estimates obtained at several points and likewise for the lower end points would not produce bands that include the entire survival curve with a given confidence. Simultaneous confidence bands, which allow confidence statements to be valid for the entire survival curve,would be more meaningful
This dissertation focuses on a novel method of developing one-sample confidence bands for survival functions from right censored data. The approach is model- based, relying on a parametric model for the conditional expectation of the censoring indicator given the observed minimum, and derives its chief strength from easy access to a good-fitting model among a plethora of choices currently available for binary response data. The substantive methodological contribution is in exploiting an available semiparametric estimator of the survival function for the one-sample case to produce improved simultaneous confidence bands. Since the relevant limiting distribution cannot be transformed to a Brownian Bridge unlike for the normalized Kaplan{Meier process, a two-stage bootstrap approach that combines the classical bootstrap with the more recent model-based regeneration of censoring indicators is proposed and a justification of its asymptotic validity is also provided. Several different confidence bands are studied using the proposed approach. Numerical studies, including robustness of the proposed bands to misspecification, are carriedout to check efficacy. The method is illustrated using two lung cancer data sets
Recommended from our members
On Modeling Spatial Time-to-Event Data with Missing Censoring Type
Time-to-event data, a common occurrence in medical research, is also pertinent in the ecological context, exemplified by leaf desiccation studies using innovative optical vulnerability techniques. Such data can unveil valuable insights into the influence of various factors on the event of interest. Leveraging both spatial and temporal information, spatial survival modeling can unravel the intricate spatiotemporal dynamics governing event occurrences. Existing spatial survival models often assume the availability of the censoring type for censored cases. Various approaches have been employed to address scenarios where a "subset" of cases lacks a known "censoring indicator" (i.e., whether they are right-censored or uncensored). This uncertainty in the subset pertains to missing information regarding the censoring status. However, our study specifically centers on situations where the missing information extends to "all" censored cases, rendering them devoid of a known censoring "type" indicator (i.e., whether they are right-censored or left-censored).
The genesis of this challenge emerged from leaf hydraulic data, specifically embolism data, where the observation of embolism events is limited to instances when leaf veins transition from water-filled to air-filled during the observation period. Although it is known that all veins eventually embolize when the entire plant dries up, the critical information of whether a censored leaf vein embolized before or after the observation period is absent. In other words, the censoring type indicator is missing.
To address this challenge, we developed a Gibbs sampler for a Bayesian spatial survival model, aiming to recover the missing censoring type indicator. This model incorporates the essential embolism formation mechanism theory, accounting for dynamic patterns observed in the embolism data. The model assumes spatial smoothness between connected leaf veins and incorporates vein thickness information. Our Gibbs sampler effectively infers the missing censoring type indicator, as demonstrated on both simulated and real-world embolism data. In applying our model to real data, we not only confirm patterns aligning with existing phytological literature but also unveil novel insights previously unexplored due to limitations in available statistical tools.
Additionally, our results suggest the potential for building hierarchical models with species-level parameters focusing solely on the temporal component. Overall, our study illustrates that the proposed Gibbs sampler for the spatial survival model successfully addresses the challenge of missing censoring type indicators, offering valuable insights into the underlying spatiotemporal dynamics
Analysis of Time-to-event Data, Intermediate Phenotypes, and Sparse Factors in the OPPERA Study
Motivated by the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) project, a large study of temporomandibular disorders (TMD), this dissertation develops statistical methods applicable to three facets of chronic pain. First, we propose a method for parameter estimation in survival models with missing censoring indicators. These result because conducting multiple invasive examinations for incidence on all participants in large prospective studies is infeasible. We estimate the probability of being an incident case for those lacking a gold standard examination using logistic regression. Multiple imputations of case status for each missing examination are generated using these estimated probabilities. Imputed and observed data are combined in Cox models to estimate the incidence rate and associations with putative risk factors. The variance is estimated using multiple imputation. Our method performs as well as or better than competing methods and highlighted new discoveries for OPPERA. Secondly, we propose a general method to analyze secondary phenotypes and apply it to the OPPERA baseline case-control study. Traditional case-control genetic association studies examine relationships between case-control status and one or more covariates. Investigators now commonly study additional phenotypes and their association with the original covariates as secondary aims. Assessing these associations is statistically challenging, as participants do not form a random sample from the population of interest. Standard methods may be biased and lack coverage and power. Utilizing inverse probability weighting and bootstrapping for standard error estimation, our method performs as well as competitors when they are applicable and provides promising results for outcomes to which other methods do not apply. Third, we propose a method for sparse factor analysis. Psychometric studies frequently measure numerous variables that may be noisy manifestations of a few underlying constructs. Aims include identifying these latent variables and their relationship to the observed variables and reducing the data to a few key variables that explain the majority of variance. While variable reduction methods exist for principal component analysis, none have been proposed to date for factor analysis. Our method retains predictive accuracy for many thresholds in simulations while providing sparse loadings. Competing methods had less predictive accuracy or less sparsity.Doctor of Philosoph
Multiple imputations and the missing censoring indicator model
Semiparametric random censorship (SRC) models (Dikta, 1998) provide an attractive framework for estimating survival functions when censoring indicators are fully or partially available. When there are missing censoring indicators (MCIs), the SRC approach employs a model-based estimate of the conditional expectation of the censoring indicator given the observed time, where the model parameters are estimated using only the complete cases. The multiple imputations approach, on the other hand, utilizes this model-based estimate to impute the missing censoring indicators and form several completed data sets. The Kaplan-Meier and SRC estimators based on the several completed data sets are averaged to arrive at the multiple imputations Kaplan-Meier (MIKM) and the multiple imputations SRC (MISRC) estimators. While the MIKM estimator is asymptotically as efficient as or less efficient than the standard SRC-based estimator that involves no imputations, here we investigate the performance of the MISRC estimator and prove that it attains the benchmark variance set by the SRC-based estimator. We also present numerical results comparing the performances of the estimators under several misspecified models for the above mentioned conditional expectation.Asymptotic normality Functional delta method Lindeberg's condition Maximum likelihood Missing at random Model-based resampling