Machine Learning for causal Inference on Observational Data

Abstract

The established scientific way to make claims about cause and effect is to perform a Randomized Controlled Trial (RCT). However, although RCTs are the best way to determine causal effects, the chances to perform such rigorous scientific experiments is, most often, either impossible or unethical. The Average Treatment Effect (ATE) is usually the outcome of the RCT experiments and this outcome is ideally proof of an effect under the studied population, which hopefully extends to other individuals. In contrast, it is most common to find Observational Data, in which the data that has been collected might be heavily unbalanced for treatment assignments, or the patients covariates might come from completely different distributions. Nevertheless, the ultimate goal of causal effects is to find the specific Individual Treatment Effect (ITE) for each patient. Identifying the Individual Treatment Effect is a topic that has always been important in the field of causality, especially within the machine learning community. Applications of such predictions are related with medicine, but can be extensively used in financial investments, advertisement placements, recommender systems for retail and social sciences, and beyond. The ability to learn complex non-linear relationships of some machine learning algorithms have been trying to detect and predict policies, in which given the particular features of an individual (patient) the algorithms could determine whether or not to apply the treatment to them. In this thesis, the ITE will be predicted using a benchmark semi synthetic-dataset which has been unbalanced. Assuming strong ignorability, alternative machine learning techniques that had not been tested in past publications will be applied to predict the ITE from observational data. The results obtained are compared with state-of-the-art outcomes; some of the algorithms applied in this work performed similarly to more complex, custom designed methods. In addition, a full review of all recent literature in the machine learning applied to causal inference has been done

    Similar works