161 research outputs found

    Marginal and Conditional Distribution Estimation from Double-Sampled Semi-Competing Risks Data

    Get PDF
    Informative dropout is a vexing problem for any biomedical study. Most existing statistical methods attempt to correct estimation bias related to this phenomenon by specifying unverifiable assumptions about the dropout mechanism. We consider a cohort study in Africa that uses an outreach programme to ascertain the vital status for dropout subjects. These data can be used to identify a number of relevant distributions. However, as only a subset of dropout subjects were followed, vital status ascertainment was incomplete. We use semi-competing risk methods as our analysis framework to address this specific case where the terminal event is incompletely ascertained and consider various procedures for estimating the marginal distribution of dropout and the marginal and conditional distributions of survival. We also consider model selection and estimation efficiency in our setting. Performance of the proposed methods is demonstrated via simulations, asymptotic study and analysis of the study data

    Nonparametric inference for Markov processes with missing absorbing state

    Get PDF
    This study examines nonparametric estimations of a transition proba- bility matrix of a nonhomogeneous Markov process with a nite state space and a partially observed absorbing state. We impose a missing-at-random assumption and propose a computationally e cient nonparametric maximum pseudolikelihood estimator (NPMPLE). The estimator depends on a parametric model that is used to estimate the probability of each absorbing state for the missing observations based, potentially, on auxiliary data. For the latter model, we propose a formal goodness- of- t test based on a residual process. Using modern empirical process theory, we show that the estimator is uniformly consistent and converges weakly to a tight mean-zero Gaussian random eld. We also provide a methodology for constructing simultaneous con dence bands. Simulation studies show that the NPMPLE works well with small sample sizes and that it is robust against some degree of misspec- i cation of the parametric model for the missing absorbing states. The method is illustrated using HIV data from sub-Saharan Africa to estimate the transition probabilities of death and disengagement from HIV care

    Semiparametric regression and risk prediction with competing risks data under missing cause of failure

    Get PDF
    The cause of failure in cohort studies that involve competing risks is frequently incompletely observed. To address this, several methods have been proposed for the semiparametric proportional cause-specific hazards model under a missing at random assumption. However, these proposals provide inference for the regression coefficients only, and do not consider the infinite dimensional parameters, such as the covariatespecific cumulative incidence function. Nevertheless, the latter quantity is essential for risk prediction in modern medicine. In this paper we propose a unified framework for inference about both the regression coefficients of the proportional cause-specific hazards model and the covariate-specific cumulative incidence functions under missing at random cause of failure. Our approach is based on a novel computationally efficient maximumpseudo-partial-likelihood estimationmethod for the semiparametric proportional cause-specific hazards model.Using modern empirical process theorywe derive the asymptotic properties of the proposed estimators for the regression coefficients and the covariate-specific cumulative incidence functions, and provide methodology for constructing simultaneous confidence bands for the latter. Simulation studies show that our estimators perform well even in the presence of a large fraction of missing cause of failures, and that the regression coefficient estimator can be substantially more efficient compared to the previously proposed augmented inverse probability weighting estimator. The method is applied using data from an HIV cohort study and a bladder cancer clinical trial

    Semiparametric Competing Risks Regression Under Interval Censoring Using the R Package intccr

    Get PDF
    Background and objective: Competing risk data are frequently interval-censored in real-world applications, that is, the exact event time is not precisely observed but is only known to lie between two time points such as clinic visits. This type of data requires special handling because the actual event times are unknown. To deal with this problem we have developed an easy-to-use open-source statistical software. Methods: An approach to perform semiparametric regression analysis of the cumulative incidence function with interval-censored competing risks data is the sieve maximum likelihood method based on B-splines. An important feature of this approach is that it does not impose restrictive parametric assumptions. Also, this methodology provides semiparametrically efficient estimates. Implementation of this methodology can be easily performed using our new R package intccr. Results: The R package intccr performs semiparametric regression analysis of the cumulative incidence function based on interval-censored competing risks data. It supports a large class of models including the proportional odds and the Fine-Gray proportional subdistribution hazards model as special cases. It also provides the estimated cumulative incidence functions for a particular combination of covariate values. The package also provides some data management functionality to handle data sets which are in a long format involving multiple lines of data per subject. Conclusions: The R package intccr provides a convenient and flexible software for the analysis of the cumulative incidence function based on interval-censored competing risks data

    Semiparametric regression on cumulative incidence function with interval-censored competing risks data

    Get PDF
    Many biomedical and clinical studies with time-to-event outcomes involve competing risks data. These data are frequently subject to interval censoring. This means that the failure time is not precisely observed but is only known to lie between two observation times such as clinical visits in a cohort study. Not taking into account the interval censoring may result in biased estimation of the cause-specific cumulative incidence function, an important quantity in the competing risks framework, used for evaluating interventions in populations, for studying the prognosis of various diseases, and for prediction and implementation science purposes. In this work, we consider the class of semiparametric generalized odds rate transformation models in the context of sieve maximum likelihood estimation based on B-splines. This large class of models includes both the proportional odds and the proportional subdistribution hazard models (i.e., the Fine-Gray model) as special cases. The estimator for the regression parameter is shown to be consistent, asymptotically normal and semiparametrically efficient. Simulation studies suggest that the method performs well even with small sample sizes. As an illustration, we use the proposed method to analyze data from HIV-infected individuals obtained from a large cohort study in sub-Saharan Africa. We also provide the R function ciregic that implements the proposed method and present an illustrative example

    Choosing profile double-sampling designs for survival estimation with application to PEPFAR evaluation

    Get PDF
    Most studies that follow subjects over time are challenged by having some subjects who dropout. Double sampling is a design that selects and devotes resources to intensively pursue and find a subset of these dropouts, then uses data obtained from these to adjust naïve estimates, which are potentially biased by the dropout. Existing methods to estimate survival from double sampling assume a random sample. In limited-resource settings, however, generating accurate estimates using a minimum of resources is important. We propose using double-sampling designs that oversample certain profiles of dropouts as more efficient alternatives to random designs. First, we develop a framework to estimate the survival function under these profile double-sampling designs. We then derive the precision of these designs as a function of the rule for selecting different profiles, in order to identify more efficient designs. We illustrate using data from the United States President's Emergency Plan for AIDS Relief-funded HIV care and treatment program in western Kenya. Our results show why and how more efficient designs should oversample patients with shorter dropout times. Further, our work suggests generalizable practice for more efficient double-sampling designs, which can help maximize efficiency in resource-limited settings

    Semiparametric regression on cumulative incidence function with interval-censored competing risks data and missing event types

    Get PDF
    Competing risk data are frequently interval-censored, that is, the exact event time is not observed but only known to lie between two examination time points such as clinic visits. In addition to interval censoring, another common complication is that the event type is missing for some study participants. In this article, we propose an augmented inverse probability weighted sieve maximum likelihood estimator for the analysis of interval-censored competing risk data in the presence of missing event types. The estimator imposes weaker than usual missing at random assumptions by allowing for the inclusion of auxiliary variables that are potentially associated with the probability of missingness. The proposed estimator is shown to be doubly robust, in the sense that it is consistent even if either the model for the probability of missingness or the model for the probability of the event type is misspecified. Extensive Monte Carlo simulation studies show good performance of the proposed method even under a large amount of missing event types. The method is illustrated using data from an HIV cohort study in sub-Saharan Africa, where a significant portion of events types is missing. The proposed method can be readily implemented using the new function ciregic_aipw in the R package intccr

    Niche Modeling of Dengue Fever Using Remotely Sensed Environmental Factors and Boosted Regression Trees

    Get PDF
    Dengue fever (DF), a vector-borne flavivirus, is endemic to the tropical countries of the world with nearly 400 million people becoming infected each year and roughly one-third of the world’s population living in areas of risk. The main vector for DF is the Aedes aegypti mosquito, which is also the same vector of yellow fever, chikungunya, and Zika viruses. To gain an understanding of the spatial aspects that can affect the epidemiological processes across the disease’s geographical range, and the spatial interactions involved, we created and compared Bernoulli and Poisson family Boosted Regression Tree (BRT) models to quantify the overall annual risk of DF incidence by municipality, using the Magdalena River watershed of Colombia as a study site during the time period between 2012 and 2014. A wide range of environmental conditions make this site ideal to develop models that, with minor adjustments, could be applied in many other geographical areas. Our results show that these BRT methods can be successfully used to identify areas at risk and presents great potential for implementation in surveillance programs

    Bayesian estimation of SARS-CoV-2 prevalence in Indiana by random testing

    Get PDF
    From 25 to 29 April 2020, the state of Indiana undertook testing of 3,658 randomly chosen state residents for the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus, the agent causing COVID-19 disease. This was the first statewide randomized study of COVID-19 testing in the United States. Both PCR and serological tests were administered to all study participants. This paper describes statistical methods used to address nonresponse among various demographic groups and to adjust for testing errors to reduce bias in the estimates of the overall disease prevalence in Indiana. These adjustments were implemented through Bayesian methods, which incorporated all available information on disease prevalence and test performance, along with external data obtained from census of the Indiana statewide population. Both adjustments appeared to have significant impact on the unadjusted estimates, mainly due to upweighting data in study participants of non-White races and Hispanic ethnicity and anticipated false-positive and false-negative test results among both the PCR and antibody tests utilized in the study

    Improving estimates of children living with HIV from the Spectrum AIDS Impact Model

    Get PDF
    Objective: Estimated numbers of children living with HIV determine programmatic and treatment needs. We explain the changes made to the UNAIDS estimates between 2015 and 2016, and describe the challenges around these estimates. Methods: Estimates of children newly infected, living with HIV, and dying of AIDS are developed by country teams using Spectrum software. Spectrum files are available for 160 countries, which represent 98% of the global population. In 2016, the methods were updated to reflect the latest evidence on mother-to-child HIV transmission and improved assumptions on the age children initiate antiretroviral therapy. We report updated results using the 2016 model and validate these estimates against mother-to-child transmission rates and HIV prevalence from population-based surveys for the survey year. Results: The revised 2016 model estimates 27% fewer children living with HIV in 2014 than the 2015 model, primarily due to changes in the probability of mother-to-child transmission among women with incident HIV during pregnancy. The revised estimates were consistent with population-based surveys of HIV transmission and HIV prevalence among children aged 5–9 years, but were lower than surveys among children aged 10–14 years. Conclusions: The revised 2016 model is an improvement on previous models. Paediatric HIV models will continue to evolve as further improvements are made to the assumptions. Commodities forecasting and programme planning rely on these estimates, and increasing accuracy will be critical to enable effective scale-up and optimal use of resources. Efforts are needed to improve empirical measures of HIV prevalence, incidence, and mortality among children
    • …
    corecore