The era of big data has witnessed an increasing availability of multiple data
sources for statistical analyses. We consider estimation of causal effects
combining big main data with unmeasured confounders and smaller validation data
with supplementary information on these confounders. Under the unconfoundedness
assumption with completely observed confounders, the smaller validation data
allow for constructing consistent estimators for causal effects, but the big
main data can only give error-prone estimators in general. However, by
leveraging the information in the big main data in a principled way, we can
improve the estimation efficiencies yet preserve the consistencies of the
initial estimators based solely on the validation data. Our framework applies
to asymptotically normal estimators, including the commonly-used regression
imputation, weighting, and matching estimators, and does not require a correct
specification of the model relating the unmeasured confounders to the observed
variables. We also propose appropriate bootstrap procedures, which makes our
method straightforward to implement using software routines for existing
estimators