1 research outputs found
Optimal Sub-sampling with Influence Functions
Sub-sampling is a common and often effective method to deal with the
computational challenges of large datasets. However, for most statistical
models, there is no well-motivated approach for drawing a non-uniform
subsample. We show that the concept of an asymptotically linear estimator and
the associated influence function leads to optimal sampling procedures for a
wide class of popular models. Furthermore, for linear regression models which
have well-studied procedures for non-uniform sub-sampling, we show our optimal
influence function based method outperforms previous approaches. We empirically
show the improved performance of our method on real datasets