2,223 research outputs found

    Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data

    Full text link
    Informative cluster size (ICS) arises in situations with clustered data where a latent relationship exists between the number of participants in a cluster and the outcome measures. Although this phenomenon has been sporadically reported in statistical literature for nearly two decades now, further exploration is needed in certain statistical methodologies to avoid potentially misleading inferences. For inference about population quantities without covariates, inverse cluster size reweightings are often employed to adjust for ICS. Further, to study the effect of covariates on disease progression described by a multistate model, the pseudo-value regression technique has gained popularity in time-to-event data analysis. We seek to answer the question: "How to apply pseudo-value regression to clustered time-to-event data when cluster size is informative?" ICS adjustment by the reweighting method can be performed in two steps; estimation of marginal functions of the multistate model and fitting the estimating equations based on pseudo-value responses, leading to four possible strategies. We present theoretical arguments and thorough simulation experiments to ascertain the correct strategy for adjusting for ICS. A further extension of our methodology is implemented to include informativeness induced by the intra-cluster group size. We demonstrate the methods in two real-world applications: (i) to determine predictors of tooth survival in a periodontal study, and (ii) to identify indicators of ambulatory recovery in spinal cord injury patients who participated in locomotor-training rehabilitation.Comment: 22 pages, 4 figures, 4 table

    Monte Carlo modified profile likelihood in models for clustered data

    Get PDF
    The main focus of the analysts who deal with clustered data is usually not on the clustering variables, and hence the group-specific parameters are treated as nuisance. If a fixed effects formulation is preferred and the total number of clusters is large relative to the single-group sizes, classical frequentist techniques relying on the profile likelihood are often misleading. The use of alternative tools, such as modifications to the profile likelihood or integrated likelihoods, for making accurate inference on a parameter of interest can be complicated by the presence of nonstandard modelling and/or sampling assumptions. We show here how to employ Monte Carlo simulation in order to approximate the modified profile likelihood in some of these unconventional frameworks. The proposed solution is widely applicable and is shown to retain the usual properties of the modified profile likelihood. The approach is examined in two instances particularly relevant in applications, i.e. missing-data models and survival models with unspecified censoring distribution. The effectiveness of the proposed solution is validated via simulation studies and two clinical trial applications

    Review of methods for handling confounding by cluster and informative cluster size in clustered data.

    Get PDF
    Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland

    Frailty Probit Models for Clustered Interval-Censored Failure Time Data

    Get PDF
    Survival analysis is an important branch of statistics that deals with time to event data or survival data. An important feature of such data is that the survival time of interest is usually not completely known but is censored due to the design of the study or an early dropout. In this dissertation we focus on studying clustered interval-censored data, a special type of survival data. Interval-censored data arise in many epidemiological, social science, and medical studies, in which subjects are examined at periodical follow-up visits. The survival (or failure) time of interest is never exactly observed but is known to fall within an interval formed by two examination times with changed status of the event of interest. Clustered intervalcensored data contributes another complication that the failure times within the same cluster are not independent. Chapter 1 of this dissertation provides a detailed description of interval-censored data with several real data examples and reviews existing regression models and approaches for clustered interval-censored data. Chapter 2 proposes a novel frailty Probit model for analyzing clustered intervalcensored data. The proposed model has several appealing properties: (1) the marginal covariate effects are proportional to the conditional effect and (2) the intra-cluster association can be quantified in terms of several nonparametric association measures in closed form. the proposed Bayesian estimation approach is easy to implement because all parameters and latent variables have their full conditionals in standard form. The approach has excellent performance in estimating the regression parameters and the baseline survival function and is also robust to misspecification of the frailty distribution. Chapter 3 extends the frailty Probit model in Chapter 2 to allow modeling both clustered and independent data through the adoption of a mixture distribution for the frailty. The proposed approach provides tests of the existence of intra-cluster association for each cluster via Bayes factors and can identify clusters with strong (weak) correlation. Two different prior structures are considered in our approach, and both lead to good estimation and testing results. Chapter 4 studies a joint modeling of clustered interval-censored failure times and the sizes of the clusters. The cluster size is modeled as an ordinal response using a parametric Probit model, and a separate frailty semiparametric Probit model is used to model the clustered failure times. The two submodels are connected through a shared random effect. The performance of the proposed model is evaluated through a simulation study

    Estimation of accelerated failure time models with random effects

    Get PDF
    Correlated survival data with possible censoring are frequently encountered in survival analysis. This includes multi center studies where subjects are clustered by clinical or other environmental factors that influence expected survival time, studies where times to several different events are monitored on each subject, and studies using groups of genetically related subjects. To analyze such data, we propose accelerated failure time (AFT) models based on lognormal frailties. AFT models provide a linear relationship between the log of the failure time and covariates that affect the expected time to failure by contracting or expanding the time scale. These models account for within cluster association by incorporating random effects with dependence structures that may be functions of unknown covariance parameters. They can be applied to right, left or interval-censored survival data. To estimate model parameters, we consider an approximate maximum likelihood estimation procedure derived from the Laplace approximation. This avoids the use of computationally intensive methods needed to evaluate the exact log-likelihood, such as MCMC methods or numerical integration that are not feasible for large data sets. Asymptotic properties of the proposed estimators are established and small sample performance is evaluated through several simulation studies. The fixed effects parameters are estimated well with little absolute bias. Asymptotic formulas tend to underestimate the standard errors for small cluster sizes. Reliable estimates depend on both the number of clusters and cluster size. The methodology is used to analyze data taken from the Minnesota Breast Cancer Family Resource to examine age-at-onset of breast cancer for women in 426 families

    Bayesian Semi- and Non-parametric Analysis for Spatially Correlated Survival Data

    Get PDF
    Flexible incorporation of both geographical patterning and risk effects in cancer survival models is becoming increasingly important, due in part to the recent availability of large cancer registries. The analysis of spatial survival data is challenged by the presence of spatial dependence and censoring for survival times. Accurately modeling the risk factors and geographical pattern that explain the differences in survival is particularly of interest. Within this dissertation, the first chapter reviews commonlyused baseline priors, semiparametric and nonparametric Bayesian survival models and recent approaches for accommodating spatial dependence, both conditional and marginal. The last three chapters contribute three flexible survival models: (1) a proportional hazards model with areal-level covariate-adjusted frailties with application to county-level breast cancer survival data, (2) a marginal Bayesian nonparametric model for time to disease arrival of threatened amphibian populations, and (3) a generalized accelerated failure time model with spatial intrinsic conditionally autoregressive frailties with application to county-level prostate cancer data. An R package spBayesSurv is developed to examine all the proposed models along with some traditional spatial survival models

    Semiparametric Methods for Survival Data with Clustering, Outcome-Dependent Sampling, Dependent Censoring, and External Time-Dependent Covariate.

    Full text link
    In this dissertation, we focus on the development of semiparametric methods for estimating proportional hazards models in the presence of non-standard data structures, namely clustering, outcome-dependent sampling, dependent censoring and external time-dependent covariate. In the first chapter, we propose methods based on estimating equations for case-cohort designs with clustered failure time data. We assume a marginal hazards model with a common baseline hazard and common regression coefficients across all clusters. Compared to their closest competitors in the literature, the proposed methods feature more tractable asymptotic derivations, variance estimation with reduced computational burden, and potentially increased efficiency. We apply these methods to the study of mortality among Canadian dialysis patients. In the second chapter, we propose methods for dealing with failure time data in the setting where the probability of sampling subjects depends on the outcome (e.g., death, survival) and where subjects are censored in a manner which is dependent on the failure rate. We employ a novel double-inverse-weighting scheme which combines weights arising from the probability of remaining uncensored and from the probability of being sampled. The proposed methods are applied to study the wait-list mortality among patients with end-stage liver disease. The third chapter is motivated by the challenges of fitting complex models to data from the smaller countries participating in the Dialysis Outcomes and Practice Patterns Study (DOPPS). We perform a comprehensive investigation of the association between the day-of-week-specific death rates and the dialysis schedule in the U.S., several European countries and Japan. Three Cox models are considered in which 'day of the week', 'day of dialysis schedule', or 'days since last dialysis' serves as a time-dependent covariate. The models are compared and contrasted, with special attention given to the setting where the sample size is small.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91398/1/huizh_1.pd
    • …
    corecore