We investigate the finite sample performance of causal machine learning
estimators for heterogeneous causal effects at different aggregation levels. We
employ an Empirical Monte Carlo Study that relies on arguably realistic data
generation processes (DGPs) based on actual data. We consider 24 different
DGPs, eleven different causal machine learning estimators, and three
aggregation levels of the estimated effects. In the main DGPs, we allow for
selection into treatment based on a rich set of observable covariates. We
provide evidence that the estimators can be categorized into three groups. The
first group performs consistently well across all DGPs and aggregation levels.
These estimators have multiple steps to account for the selection into the
treatment and the outcome process. The second group shows competitive
performance only for particular DGPs. The third group is clearly outperformed
by the other estimators