The key challenge underlying machine learning is generalisation to new data.
This work studies generalisation for datasets consisting of related tasks that
may differ in causal mechanisms. For example, observational medical data for
complex diseases suffers from heterogeneity in causal mechanisms of disease
across patients, creating challenges for machine learning algorithms that need
to generalise to new patients outside of the training dataset. Common
approaches for learning supervised models with heterogeneous datasets include
learning a global model for the entire dataset, learning local models for each
tasks' data, or utilising hierarchical, meta-learning and multi-task learning
approaches to learn how to generalise from data pooled across multiple tasks.
In this paper we propose causal similarity-based hierarchical Bayesian models
to improve generalisation to new tasks by learning how to pool data from
training tasks with similar causal mechanisms. We apply this general modelling
principle to Bayesian neural networks and compare a variety of methods for
estimating causal task similarity (for both known and unknown causal models).
We demonstrate the benefits of our approach and applicability to real world
problems through a range of experiments on simulated and real data