3 research outputs found
Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data
Informative cluster size (ICS) arises in situations with clustered data where
a latent relationship exists between the number of participants in a cluster
and the outcome measures. Although this phenomenon has been sporadically
reported in statistical literature for nearly two decades now, further
exploration is needed in certain statistical methodologies to avoid potentially
misleading inferences. For inference about population quantities without
covariates, inverse cluster size reweightings are often employed to adjust for
ICS. Further, to study the effect of covariates on disease progression
described by a multistate model, the pseudo-value regression technique has
gained popularity in time-to-event data analysis. We seek to answer the
question: "How to apply pseudo-value regression to clustered time-to-event data
when cluster size is informative?" ICS adjustment by the reweighting method can
be performed in two steps; estimation of marginal functions of the multistate
model and fitting the estimating equations based on pseudo-value responses,
leading to four possible strategies. We present theoretical arguments and
thorough simulation experiments to ascertain the correct strategy for adjusting
for ICS. A further extension of our methodology is implemented to include
informativeness induced by the intra-cluster group size. We demonstrate the
methods in two real-world applications: (i) to determine predictors of tooth
survival in a periodontal study, and (ii) to identify indicators of ambulatory
recovery in spinal cord injury patients who participated in locomotor-training
rehabilitation.Comment: 22 pages, 4 figures, 4 table
Pseudo-value regression of clustered multistate current status data with informative cluster sizes
Multistate current status (CS) data presents a more severe form of censoring
due to the single observation of study participants transitioning through a
sequence of well-defined disease states at random inspection times. Moreover,
these data may be clustered within specified groups, and informativeness of the
cluster sizes may arise due to the existing latent relationship between the
transition outcomes and the cluster sizes. Failure to adjust for this
informativeness may lead to a biased inference. Motivated by a clinical study
of periodontal disease (PD), we propose an extension of the pseudo-value
approach to estimate covariate effects on the state occupation probabilities
(SOP) for these clustered multistate CS data with informative cluster or
subcluster sizes. In our approach, the proposed pseudo-value technique
initially computes marginal estimators of the SOP utilizing nonparametric
regression. Next, the estimating equations based on the corresponding
pseudo-values are reweighted by functions of the cluster sizes to adjust for
informativeness. We perform a variety of simulation studies to study the
properties of our pseudo-value regression based on the nonparametric marginal
estimators under different scenarios of informativeness. For illustration, the
method is applied to the motivating PD dataset, which encapsulates the complex
data-generating mechanism.Comment: 19 pages, 5 figures, 5 table
Dynamic Sampling Versions of Popular SPC Charts for Big Data Analysis
The statistical process control (SPC) chart is an effective tool for the analysis, interpretation, and visualization of data from sequential processes. Commonly used SPC charts such as the Shewhart, CUSUM and EWMA charts are widely implemented in detecting distributional shifts in various processes. With recent scientific and technological advancements, massive amounts of data continue to be generated by production, medical, agricultural and many other industrial processes. Conventional SPC charts have significant drawbacks in monitoring such processes, specifically when the velocity of the data flow is greater than the run time of the monitoring procedure. In the literature, dynamic sampling control charts (Li and Qiu, 2014) are becoming popular due to their ability to adaptively control the next sampling time of the monitoring process. In this thesis, we incorporate similar ideas to conventional SPC charts for the real-time monitoring of big data processes.
Traditional SPC charts are designed to give a warning signal at a particular time point if a process reading plots beyond its control limit(s). This approach does not provide ample information of the likelihood of a potential shift in the process. We implement existing methods of designing control charts with p-values, which gives information about the performance of the current observations and potentially, of observations in near future. The control chart gives a signal for a mean shift if the p-value is less than some pre-specified significance level. We utilize the computed p-values of the charting statistic in designing variable sampling schemes, specifically the dynamic sampling schemes which are an increasing function of the p-value. The resulting control charts have variable sampling intervals, and hence skips several observations. Thus, their computing times are much faster than traditional charts.
This thesis provides guidance on how to incorporate dynamic sampling schemes for monitoring big data streams in other types of SPC charts. We perform extensive simulation studies to compare the performance of the dynamic sampling control charts with conventional control charts. Our results show that the dynamic sampling versions of three commonly used SPC charts can monitor big data streams efficiently