Most of the available methods for longitudinal data analysis are designed and
validated for the situation where the number of subjects is large and the
number of observations per subject is relatively small. Motivated by the
Naturalistic Teenage Driving Study (NTDS), which represents the exact opposite
situation, we examine standard and propose new methodology for marginal
analysis of longitudinal count data in a small number of very long sequences.
We consider standard methods based on generalized estimating equations, under
working independence or an appropriate correlation structure, and find them
unsatisfactory for dealing with time-dependent covariates when the counts are
low. For this situation, we explore a within-cluster resampling (WCR) approach
that involves repeated analyses of random subsamples with a final analysis that
synthesizes results across subsamples. This leads to a novel WCR method which
operates on separated blocks within subjects and which performs better than all
of the previously considered methods. The methods are applied to the NTDS data
and evaluated in simulation experiments mimicking the NTDS.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS507 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org