We present statistical methods for big data arising from online analytical
processing, where large amounts of data arrive in streams and require fast
analysis without storage/access to the historical data. In particular, we
develop iterative estimating algorithms and statistical inferences for linear
models and estimating equations that update as new data arrive. These
algorithms are computationally efficient, minimally storage-intensive, and
allow for possible rank deficiencies in the subset design matrices due to
rare-event covariates. Within the linear model setting, the proposed
online-updating framework leads to predictive residual tests that can be used
to assess the goodness-of-fit of the hypothesized model. We also propose a new
online-updating estimator under the estimating equation setting. Theoretical
properties of the goodness-of-fit tests and proposed estimators are examined in
detail. In simulation studies and real data applications, our estimator
compares favorably with competing approaches under the estimating equation
setting.Comment: Submitted to Technometric