Bayesian network structure learning in the presence of latent variables

Abstract

A causal Bayesian Network (BN) is a probabilistic graphical model that captures causal or conditional relationships between variables, and enables causal reasoning under uncertainty. Causal reasoning via graphical representation in turn enables interpretability and full transparency in decision-making, and this makes causal BNs suitable for modelling critical real-world problems that require explainability, such as in healthcare, environmental sciences, government policy and economics. Learning accurate causal structure from data represents a notoriously difficult task, and this difficulty increases with any imperfections present in the input data. For example, real data tend not to capture all relevant variables needed for causal representation, and these missing variables are referred to as hidden or latent variables. If some of the latent variables are latent confounders (i.e., missing common causes), they would confound the effect variables, thereby leading to spurious relationships in the learnt structure that could be misinterpreted as causal relationships. While the relevant literature includes structure learning algorithms that are capable of learning causal structure from data with latent variables, it is fair to say that accurate structural discovery from real data remains an open problem. This thesis studies structure learning algorithms that recover graphical structure from data, and primarily focuses on the problem of latent variables. It investigates new solutions, including structure learning algorithms that learn from both observational and interventional data, approaches for density estimation that can be used to recover the underlying distribution of possible latent confounders, and techniques for hyperparameter optimisation of structure learning algorithms. The thesis explores this set of new approaches by applying them to a range of synthetic and real datasets of varying size, dimensionality, and data noise, and concludes by highlighting open problems and directions for future research

    Similar works