3 research outputs found

    Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data

    Full text link
    Informative cluster size (ICS) arises in situations with clustered data where a latent relationship exists between the number of participants in a cluster and the outcome measures. Although this phenomenon has been sporadically reported in statistical literature for nearly two decades now, further exploration is needed in certain statistical methodologies to avoid potentially misleading inferences. For inference about population quantities without covariates, inverse cluster size reweightings are often employed to adjust for ICS. Further, to study the effect of covariates on disease progression described by a multistate model, the pseudo-value regression technique has gained popularity in time-to-event data analysis. We seek to answer the question: "How to apply pseudo-value regression to clustered time-to-event data when cluster size is informative?" ICS adjustment by the reweighting method can be performed in two steps; estimation of marginal functions of the multistate model and fitting the estimating equations based on pseudo-value responses, leading to four possible strategies. We present theoretical arguments and thorough simulation experiments to ascertain the correct strategy for adjusting for ICS. A further extension of our methodology is implemented to include informativeness induced by the intra-cluster group size. We demonstrate the methods in two real-world applications: (i) to determine predictors of tooth survival in a periodontal study, and (ii) to identify indicators of ambulatory recovery in spinal cord injury patients who participated in locomotor-training rehabilitation.Comment: 22 pages, 4 figures, 4 table

    Pseudo-value regression of clustered multistate current status data with informative cluster sizes

    Full text link
    Multistate current status (CS) data presents a more severe form of censoring due to the single observation of study participants transitioning through a sequence of well-defined disease states at random inspection times. Moreover, these data may be clustered within specified groups, and informativeness of the cluster sizes may arise due to the existing latent relationship between the transition outcomes and the cluster sizes. Failure to adjust for this informativeness may lead to a biased inference. Motivated by a clinical study of periodontal disease (PD), we propose an extension of the pseudo-value approach to estimate covariate effects on the state occupation probabilities (SOP) for these clustered multistate CS data with informative cluster or subcluster sizes. In our approach, the proposed pseudo-value technique initially computes marginal estimators of the SOP utilizing nonparametric regression. Next, the estimating equations based on the corresponding pseudo-values are reweighted by functions of the cluster sizes to adjust for informativeness. We perform a variety of simulation studies to study the properties of our pseudo-value regression based on the nonparametric marginal estimators under different scenarios of informativeness. For illustration, the method is applied to the motivating PD dataset, which encapsulates the complex data-generating mechanism.Comment: 19 pages, 5 figures, 5 table

    Dynamic Sampling Versions of Popular SPC Charts for Big Data Analysis

    Get PDF
    The statistical process control (SPC) chart is an effective tool for the analysis, interpretation, and visualization of data from sequential processes. Commonly used SPC charts such as the Shewhart, CUSUM and EWMA charts are widely implemented in detecting distributional shifts in various processes. With recent scientific and technological advancements, massive amounts of data continue to be generated by production, medical, agricultural and many other industrial processes. Conventional SPC charts have significant drawbacks in monitoring such processes, specifically when the velocity of the data flow is greater than the run time of the monitoring procedure. In the literature, dynamic sampling control charts (Li and Qiu, 2014) are becoming popular due to their ability to adaptively control the next sampling time of the monitoring process. In this thesis, we incorporate similar ideas to conventional SPC charts for the real-time monitoring of big data processes. Traditional SPC charts are designed to give a warning signal at a particular time point if a process reading plots beyond its control limit(s). This approach does not provide ample information of the likelihood of a potential shift in the process. We implement existing methods of designing control charts with p-values, which gives information about the performance of the current observations and potentially, of observations in near future. The control chart gives a signal for a mean shift if the p-value is less than some pre-specified significance level. We utilize the computed p-values of the charting statistic in designing variable sampling schemes, specifically the dynamic sampling schemes which are an increasing function of the p-value. The resulting control charts have variable sampling intervals, and hence skips several observations. Thus, their computing times are much faster than traditional charts. This thesis provides guidance on how to incorporate dynamic sampling schemes for monitoring big data streams in other types of SPC charts. We perform extensive simulation studies to compare the performance of the dynamic sampling control charts with conventional control charts. Our results show that the dynamic sampling versions of three commonly used SPC charts can monitor big data streams efficiently
    corecore