We propose a new method to estimate causal effects from nonexperimental data.
Each pair of sample units is first associated with a stochastic 'treatment' -
differences in factors between units - and an effect - a resultant outcome
difference. It is then proposed that all such pairs can be combined to provide
more accurate estimates of causal effects in observational data, provided a
statistical model connecting combinatorial properties of treatments to the
accuracy and unbiasedness of their effects. The article introduces one such
model and a Bayesian approach to combine the O(n2) pairwise observations
typically available in nonexperimnetal data. This also leads to an
interpretation of nonexperimental datasets as incomplete, or noisy, versions of
ideal factorial experimental designs.
This approach to causal effect estimation has several advantages: (1) it
expands the number of observations, converting thousands of individuals into
millions of observational treatments; (2) starting with treatments closest to
the experimental ideal, it identifies noncausal variables that can be ignored
in the future, making estimation easier in each subsequent iteration while
departing minimally from experiment-like conditions; (3) it recovers individual
causal effects in heterogeneous populations. We evaluate the method in
simulations and the National Supported Work (NSW) program, an intensively
studied program whose effects are known from randomized field experiments. We
demonstrate that the proposed approach recovers causal effects in common NSW
samples, as well as in arbitrary subpopulations and an order-of-magnitude
larger supersample with the entire national program data, outperforming
Statistical, Econometrics and Machine Learning estimators in all cases..