6 research outputs found

    Estimation/Imputation strategies for missing data in survival analysis

    Get PDF
    International audienceWe consider the problem of estimation from right-censored data, when the censoring indicator is possibly missing. We compare different estimatio/imputation strategies for recovering nuisance functional parameters. More precisely, we propose either a parametric strategy following a logistic model standard or a pure nonparametric regression strategy. We provide theoretical properties and numerical comparisons for these procedures

    On the study of the Beran estimator for generalized censoring indicators

    Full text link
    Along with the analysis of time-to-event data, it is common to assume that only partial information is given at hand. In the presence of right-censored data with covariates, the conditional Kaplan-Meier estimator (also referred as the Beran estimator) is known to propose a consistent estimate for the lifetimes conditional survival function. However, a necessary condition is the clear knowledge of whether each individual is censored or not, although, this information might be incomplete or even totally absent in practice. We thus propose a study on the Beran estimator when the censoring indicator is not clearly specified. From this, we provide a new estimator for the conditional survival function and establish its asymptotic normality under mild conditions. We further study the supervised learning problem where the conditional survival function is to be predicted with no censorship indicators. To this aim, we investigate various approaches estimating the conditional expectation for the censoring indicator. Along with the theoretical results, we illustrate how the estimators work for small samples by means of a simulation study and show their practical applicability with the analysis of synthetic data and the study of real data for the prognosis of monoclonal gammopathy

    Confidence bands for survival functions under semiparametric random censorship models

    Get PDF
    In medical reports point estimates and pointwise confidence intervals of parameters are usually displayed. When the parameter is a survival function, however, the approach of joining the upper end points of individual interval estimates obtained at several points and likewise for the lower end points would not produce bands that include the entire survival curve with a given confidence. Simultaneous confidence bands, which allow confidence statements to be valid for the entire survival curve,would be more meaningful This dissertation focuses on a novel method of developing one-sample confidence bands for survival functions from right censored data. The approach is model- based, relying on a parametric model for the conditional expectation of the censoring indicator given the observed minimum, and derives its chief strength from easy access to a good-fitting model among a plethora of choices currently available for binary response data. The substantive methodological contribution is in exploiting an available semiparametric estimator of the survival function for the one-sample case to produce improved simultaneous confidence bands. Since the relevant limiting distribution cannot be transformed to a Brownian Bridge unlike for the normalized Kaplan{Meier process, a two-stage bootstrap approach that combines the classical bootstrap with the more recent model-based regeneration of censoring indicators is proposed and a justification of its asymptotic validity is also provided. Several different confidence bands are studied using the proposed approach. Numerical studies, including robustness of the proposed bands to misspecification, are carriedout to check efficacy. The method is illustrated using two lung cancer data sets

    Analysis of Time-to-event Data, Intermediate Phenotypes, and Sparse Factors in the OPPERA Study

    Get PDF
    Motivated by the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) project, a large study of temporomandibular disorders (TMD), this dissertation develops statistical methods applicable to three facets of chronic pain. First, we propose a method for parameter estimation in survival models with missing censoring indicators. These result because conducting multiple invasive examinations for incidence on all participants in large prospective studies is infeasible. We estimate the probability of being an incident case for those lacking a gold standard examination using logistic regression. Multiple imputations of case status for each missing examination are generated using these estimated probabilities. Imputed and observed data are combined in Cox models to estimate the incidence rate and associations with putative risk factors. The variance is estimated using multiple imputation. Our method performs as well as or better than competing methods and highlighted new discoveries for OPPERA. Secondly, we propose a general method to analyze secondary phenotypes and apply it to the OPPERA baseline case-control study. Traditional case-control genetic association studies examine relationships between case-control status and one or more covariates. Investigators now commonly study additional phenotypes and their association with the original covariates as secondary aims. Assessing these associations is statistically challenging, as participants do not form a random sample from the population of interest. Standard methods may be biased and lack coverage and power. Utilizing inverse probability weighting and bootstrapping for standard error estimation, our method performs as well as competitors when they are applicable and provides promising results for outcomes to which other methods do not apply. Third, we propose a method for sparse factor analysis. Psychometric studies frequently measure numerous variables that may be noisy manifestations of a few underlying constructs. Aims include identifying these latent variables and their relationship to the observed variables and reducing the data to a few key variables that explain the majority of variance. While variable reduction methods exist for principal component analysis, none have been proposed to date for factor analysis. Our method retains predictive accuracy for many thresholds in simulations while providing sparse loadings. Competing methods had less predictive accuracy or less sparsity.Doctor of Philosoph

    Multiple imputations and the missing censoring indicator model

    No full text
    Semiparametric random censorship (SRC) models (Dikta, 1998) provide an attractive framework for estimating survival functions when censoring indicators are fully or partially available. When there are missing censoring indicators (MCIs), the SRC approach employs a model-based estimate of the conditional expectation of the censoring indicator given the observed time, where the model parameters are estimated using only the complete cases. The multiple imputations approach, on the other hand, utilizes this model-based estimate to impute the missing censoring indicators and form several completed data sets. The Kaplan-Meier and SRC estimators based on the several completed data sets are averaged to arrive at the multiple imputations Kaplan-Meier (MIKM) and the multiple imputations SRC (MISRC) estimators. While the MIKM estimator is asymptotically as efficient as or less efficient than the standard SRC-based estimator that involves no imputations, here we investigate the performance of the MISRC estimator and prove that it attains the benchmark variance set by the SRC-based estimator. We also present numerical results comparing the performances of the estimators under several misspecified models for the above mentioned conditional expectation.Asymptotic normality Functional delta method Lindeberg's condition Maximum likelihood Missing at random Model-based resampling
    corecore