7 research outputs found
Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation
The counterfactual distribution models the effect of the treatment in the
untreated group. While most of the work focuses on the expected values of the
treatment effect, one may be interested in the whole counterfactual
distribution or other quantities associated to it. Building on the framework of
Bayesian conditional mean embeddings, we propose a Bayesian approach for
modeling the counterfactual distribution, which leads to quantifying the
epistemic uncertainty about the distribution. The framework naturally extends
to the setting where one observes multiple treatment effects (e.g. an
intermediate effect after an interim period, and an ultimate treatment effect
which is of main interest) and allows for additionally modelling uncertainty
about the relationship of these effects. For such goal, we present three novel
Bayesian methods to estimate the expectation of the ultimate treatment effect,
when only noisy samples of the dependence between intermediate and ultimate
effects are provided. These methods differ on the source of uncertainty
considered and allow for combining two sources of data. Moreover, we generalize
these ideas to the off-policy evaluation framework, which can be seen as an
extension of the counterfactual estimation problem. We empirically explore the
calibration of the algorithms in two different experimental settings which
require data fusion, and illustrate the value of considering the uncertainty
stemming from the two sources of data
Explaining the Uncertain: Stochastic Shapley Values for Gaussian Process Models
We present a novel approach for explaining Gaussian processes (GPs) that can
utilize the full analytical covariance structure present in GPs. Our method is
based on the popular solution concept of Shapley values extended to stochastic
cooperative games, resulting in explanations that are random variables. The GP
explanations generated using our approach satisfy similar favorable axioms to
standard Shapley values and possess a tractable covariance function across
features and data observations. This covariance allows for quantifying
explanation uncertainties and studying the statistical dependencies between
explanations. We further extend our framework to the problem of predictive
explanation, and propose a Shapley prior over the explanation function to
predict Shapley values for new data based on previously computed ones. Our
extensive illustrations demonstrate the effectiveness of the proposed approach.Comment: 26 pages, 6 figure
Dual Instrumental Method for Confounded Kernelized Bandits
The contextual bandit problem is a theoretically justified framework with
wide applications in various fields. While the previous study on this problem
usually requires independence between noise and contexts, our work considers a
more sensible setting where the noise becomes a latent confounder that affects
both contexts and rewards. Such a confounded setting is more realistic and
could expand to a broader range of applications. However, the unresolved
confounder will cause a bias in reward function estimation and thus lead to a
large regret. To deal with the challenges brought by the confounder, we apply
the dual instrumental variable regression, which can correctly identify the
true reward function. We prove the convergence rate of this method is
near-optimal in two types of widely used reproducing kernel Hilbert spaces.
Therefore, we can design computationally efficient and regret-optimal
algorithms based on the theoretical guarantees for confounded bandit problems.
The numerical results illustrate the efficacy of our proposed algorithms in the
confounded bandit setting
Bayesian Perspectives on Conditional Kernel Mean Embeddings: Hyperparameter Learning and Probabilistic Inference
This thesis presents the narrative of a particular journey towards discovering and developing Bayesian perspectives on conditional kernel mean embeddings. It is motivated by the desire and need to learn flexible and richer representations of conditional distributions for probabilistic inference in various contexts. While conditional kernel mean embeddings are able to achieve such representations, it is unclear how their hyperparameters can be learned for probabilistic inference in various settings. These hyperparameters govern the space of possible representations, and critically influence the degree of inference accuracy. At its core, this thesis argues for the notion that Bayesian perspectives lead to principled ways for formulating frameworks that provides a holistic treatment to model, learning, and inference.
The story begins by emulating required properties of Bayesian frameworks via learning theoretic bounds. This is carried through the lens of a probabilistic multiclass setting, resulting in the multiclass conditional embedding framework. Through establishing convergence to multiclass probabilities and deriving learning theoretic and Rademacher complexity bounds, the framework arrives at an expected risk bound whose realizations exhibits desirable properties for hyperparameter learning such as the ever-crucial balance between data-fit error and model complexity, emulating marginal likelihoods. The probabilistic nature of this bound enable batch learning for scalability, and the generality of the model allow for various model architectures to be used and learned end-to-end.
The narrative unfolds into forming approximate Bayesian inference frameworks directly for the likelihood-free Bayesian inference problem, leading to the kernel embedding likelihood-free inference framework. The core motivator centers around the natural suitability of conditional kernel mean embeddings to forming surrogate probabilistic models. By leveraging the likelihood-free Bayesian inference problem structure, surrogate models for both hyperparameter learning and posterior inference are developed.
Finally, the journey concludes with a Bayesian regression framework that aligns the learning and inference to both the problem and the model. This begins by a careful formulation of the conditional mean and the novel deconditional mean problem, thereby revealing the novel deconditional mean embeddings as core elements of the wider kernel mean embedding framework. They can further be established as a nonparametric Bayes' rule with applications towards Bayesian inference. Crucially, by introducing the task transformed regression problem, they can be extended to the novel task transformed Gaussian processes as their Bayesian form, whose marginal likelihood can be used to learn hyperparameters in various forms and contexts.
These perspectives and frameworks developed in this thesis shed light into creative ways conditional kernel mean embeddings can be learned and applied in existing problem domains, and further inspire elegant solutions in novel problem settings