46 research outputs found
Better subset regression
To find efficient screening methods for high dimensional linear regression
models, this paper studies the relationship between model fitting and screening
performance. Under a sparsity assumption, we show that a subset that includes
the true submodel always yields smaller residual sum of squares (i.e., has
better model fitting) than all that do not in a general asymptotic setting.
This indicates that, for screening important variables, we could follow a
"better fitting, better screening" rule, i.e., pick a "better" subset that has
better model fitting. To seek such a better subset, we consider the
optimization problem associated with best subset regression. An EM algorithm,
called orthogonalizing subset screening, and its accelerating version are
proposed for searching for the best subset. Although the two algorithms cannot
guarantee that a subset they yield is the best, their monotonicity property
makes the subset have better model fitting than initial subsets generated by
popular screening methods, and thus the subset can have better screening
performance asymptotically. Simulation results show that our methods are very
competitive in high dimensional variable screening even for finite sample
sizes.Comment: 24 pages, 1 figur
Unweighted estimation based on optimal sample under measurement constraints
To tackle massive data, subsampling is a practical approach to select the
more informative data points. However, when responses are expensive to measure,
developing efficient subsampling schemes is challenging, and an optimal
sampling approach under measurement constraints was developed to meet this
challenge. This method uses the inverses of optimal sampling probabilities to
reweight the objective function, which assigns smaller weights to the more
important data points. Thus the estimation efficiency of the resulting
estimator can be improved. In this paper, we propose an unweighted estimating
procedure based on optimal subsamples to obtain a more efficient estimator. We
obtain the unconditional asymptotic distribution of the estimator via
martingale techniques without conditioning on the pilot estimate, which has
been less investigated in the existing subsampling literature. Both asymptotic
results and numerical results show that the unweighted estimator is more
efficient in parameter estimation