133 research outputs found
Generalization Bounds Derived IPM-Based Regularization for Domain Adaptation
Domain adaptation has received much attention as a major
form of transfer learning. One issue that should be considered in
domain adaptation is the gap between source domain and
target domain. In order to improve the generalization ability
of domain adaption methods, we proposed a framework
for domain adaptation combining source and target data,
with a new regularizer which takes generalization bounds
into account. This regularization term considers integral
probability metric (IPM) as the distance between the
source domain and the target domain and thus can bound
up the testing error of an existing predictor from the
formula. Since the computation of IPM only involves
two distributions, this generalization term is independent
with specific classifiers. With popular learning models,
the empirical risk minimization is expressed as a general
convex optimization problem and thus can be solved effectively
by existing tools. Empirical studies on synthetic data for
regression and real-world data for classification show the
effectiveness of this method
A survey on domain adaptation theory: learning bounds and theoretical guarantees
All famous machine learning algorithms that comprise both supervised and
semi-supervised learning work well only under a common assumption: the training
and test data follow the same distribution. When the distribution changes, most
statistical models must be reconstructed from newly collected data, which for
some applications can be costly or impossible to obtain. Therefore, it has
become necessary to develop approaches that reduce the need and the effort to
obtain new labeled samples by exploiting data that are available in related
areas, and using these further across similar fields. This has given rise to a
new machine learning framework known as transfer learning: a learning setting
inspired by the capability of a human being to extrapolate knowledge across
tasks to learn more efficiently. Despite a large amount of different transfer
learning scenarios, the main objective of this survey is to provide an overview
of the state-of-the-art theoretical results in a specific, and arguably the
most popular, sub-field of transfer learning, called domain adaptation. In this
sub-field, the data distribution is assumed to change across the training and
the test data, while the learning task remains the same. We provide a first
up-to-date description of existing results related to domain adaptation problem
that cover learning bounds based on different statistical learning frameworks
Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects
Practitioners in diverse fields such as healthcare, economics and education
are eager to apply machine learning to improve decision making. The cost and
impracticality of performing experiments and a recent monumental increase in
electronic record keeping has brought attention to the problem of evaluating
decisions based on non-experimental observational data. This is the setting of
this work. In particular, we study estimation of individual-level causal
effects, such as a single patient's response to alternative medication, from
recorded contexts, decisions and outcomes. We give generalization bounds on the
error in estimated effects based on distance measures between groups receiving
different treatments, allowing for sample re-weighting. We provide conditions
under which our bound is tight and show how it relates to results for
unsupervised domain adaptation. Led by our theoretical results, we devise
representation learning algorithms that minimize our bound, by regularizing the
representation's induced treatment group distance, and encourage sharing of
information between treatment groups. We extend these algorithms to
simultaneously learn a weighted representation to further reduce treatment
group distances. Finally, an experimental evaluation on real and synthetic data
shows the value of our proposed representation architecture and regularization
scheme
Hedging Complexity in Generalization via a Parametric Distributionally Robust Optimization Framework
Empirical risk minimization (ERM) and distributionally robust optimization
(DRO) are popular approaches for solving stochastic optimization problems that
appear in operations management and machine learning. Existing generalization
error bounds for these methods depend on either the complexity of the cost
function or dimension of the random perturbations. Consequently, the
performance of these methods can be poor for high-dimensional problems with
complex objective functions. We propose a simple approach in which the
distribution of random perturbations is approximated using a parametric family
of distributions. This mitigates both sources of complexity; however, it
introduces a model misspecification error. We show that this new source of
error can be controlled by suitable DRO formulations. Our proposed parametric
DRO approach has significantly improved generalization bounds over existing ERM
and DRO methods and parametric ERM for a wide variety of settings. Our method
is particularly effective under distribution shifts and works broadly in
contextual optimization. We also illustrate the superior performance of our
approach on both synthetic and real-data portfolio optimization and regression
tasks.Comment: Preliminary version appeared in AISTATS 202
Variational Counterfactual Prediction under Runtime Domain Corruption
To date, various neural methods have been proposed for causal effect
estimation based on observational data, where a default assumption is the same
distribution and availability of variables at both training and inference
(i.e., runtime) stages. However, distribution shift (i.e., domain shift) could
happen during runtime, and bigger challenges arise from the impaired
accessibility of variables. This is commonly caused by increasing privacy and
ethical concerns, which can make arbitrary variables unavailable in the entire
runtime data and imputation impractical. We term the co-occurrence of domain
shift and inaccessible variables runtime domain corruption, which seriously
impairs the generalizability of a trained counterfactual predictor. To counter
runtime domain corruption, we subsume counterfactual prediction under the
notion of domain adaptation. Specifically, we upper-bound the error w.r.t. the
target domain (i.e., runtime covariates) by the sum of source domain error and
inter-domain distribution distance. In addition, we build an adversarially
unified variational causal effect model, named VEGAN, with a novel two-stage
adversarial domain adaptation scheme to reduce the latent distribution
disparity between treated and control groups first, and between training and
runtime variables afterwards. We demonstrate that VEGAN outperforms other
state-of-the-art baselines on individual-level treatment effect estimation in
the presence of runtime domain corruption on benchmark datasets
Entropic Optimal Transport in Machine Learning: applications to distributional regression, barycentric estimation and probability matching
Regularised optimal transport theory has been gaining increasing interest in machine learning as a versatile tool to handle and compare probability measures. Entropy-based regularisations, known as Sinkhorn divergences, have proved successful in a wide range of applications: as a metric for clustering and barycenters estimation, as a tool to transfer information in domain adaptation, and as a fitting loss for generative models, to name a few. Given this success, it is crucial to investigate the statistical and optimization properties of such models. These aspects are instrumental to design new and principled paradigms that contribute to further advance the field. Nonetheless, questions on asymptotic guarantees of the estimators based on Entropic Optimal Transport have received less attention. In this thesis we target such questions, focusing on three major settings where Entropic Optimal Transport has been used: learning histograms in supervised frameworks, barycenter estimation and probability matching. We present the first consistent estimator for learning with Sinkhorn loss in supervised settings, with explicit excess risk bounds. We propose a novel algorithm for Sinkhorn barycenters that handles arbitrary probability distributions with provable global convergence guarantees. Finally, we address generative models with Sinkhorn divergence as loss function: we analyse the role of the latent distribution and the generator from a modelling and statistical perspective. We propose a method that learns the latent distribution and the generator jointly and we characterize the generalization properties of such estimator. Overall, the tools developed in this work contribute to the understanding of the theoretical properties of Entropic Optimal Transport and their versatility in machine learning
Optimal Transport for Treatment Effect Estimation
Estimating conditional average treatment effect from observational data is
highly challenging due to the existence of treatment selection bias. Prevalent
methods mitigate this issue by aligning distributions of different treatment
groups in the latent space. However, there are two critical problems that these
methods fail to address: (1) mini-batch sampling effects (MSE), which causes
misalignment in non-ideal mini-batches with outcome imbalance and outliers; (2)
unobserved confounder effects (UCE), which results in inaccurate discrepancy
calculation due to the neglect of unobserved confounders. To tackle these
problems, we propose a principled approach named Entire Space CounterFactual
Regression (ESCFR), which is a new take on optimal transport in the context of
causality. Specifically, based on the framework of stochastic optimal
transport, we propose a relaxed mass-preserving regularizer to address the MSE
issue and design a proximal factual outcome regularizer to handle the UCE
issue. Extensive experiments demonstrate that our proposed ESCFR can
successfully tackle the treatment selection bias and achieve significantly
better performance than state-of-the-art methods.Comment: Accepted as NeurIPS 2023 Poste
Adaptive-Step Graph Meta-Learner for Few-Shot Graph Classification
Graph classification aims to extract accurate information from
graph-structured data for classification and is becoming more and more
important in graph learning community. Although Graph Neural Networks (GNNs)
have been successfully applied to graph classification tasks, most of them
overlook the scarcity of labeled graph data in many applications. For example,
in bioinformatics, obtaining protein graph labels usually needs laborious
experiments. Recently, few-shot learning has been explored to alleviate this
problem with only given a few labeled graph samples of test classes. The shared
sub-structures between training classes and test classes are essential in
few-shot graph classification. Exiting methods assume that the test classes
belong to the same set of super-classes clustered from training classes.
However, according to our observations, the label spaces of training classes
and test classes usually do not overlap in real-world scenario. As a result,
the existing methods don't well capture the local structures of unseen test
classes. To overcome the limitation, in this paper, we propose a direct method
to capture the sub-structures with well initialized meta-learner within a few
adaptation steps. More specifically, (1) we propose a novel framework
consisting of a graph meta-learner, which uses GNNs based modules for fast
adaptation on graph data, and a step controller for the robustness and
generalization of meta-learner; (2) we provide quantitative analysis for the
framework and give a graph-dependent upper bound of the generalization error
based on our framework; (3) the extensive experiments on real-world datasets
demonstrate that our framework gets state-of-the-art results on several
few-shot graph classification tasks compared to baselines
Cost-Effective Incentive Allocation via Structured Counterfactual Inference
We address a practical problem ubiquitous in modern marketing campaigns, in
which a central agent tries to learn a policy for allocating strategic
financial incentives to customers and observes only bandit feedback. In
contrast to traditional policy optimization frameworks, we take into account
the additional reward structure and budget constraints common in this setting,
and develop a new two-step method for solving this constrained counterfactual
policy optimization problem. Our method first casts the reward estimation
problem as a domain adaptation problem with supplementary structure, and then
subsequently uses the estimators for optimizing the policy with constraints. We
also establish theoretical error bounds for our estimation procedure and we
empirically show that the approach leads to significant improvement on both
synthetic and real datasets
Algorithm-Dependent Bounds for Representation Learning of Multi-Source Domain Adaptation
We use information-theoretic tools to derive a novel analysis of Multi-source
Domain Adaptation (MDA) from the representation learning perspective.
Concretely, we study joint distribution alignment for supervised MDA with few
target labels and unsupervised MDA with pseudo labels, where the latter is
relatively hard and less commonly studied. We further provide
algorithm-dependent generalization bounds for these two settings, where the
generalization is characterized by the mutual information between the
parameters and the data. Then we propose a novel deep MDA algorithm, implicitly
addressing the target shift through joint alignment. Finally, the mutual
information bounds are extended to this algorithm providing a non-vacuous
gradient-norm estimation. The proposed algorithm has comparable performance to
the state-of-the-art on target-shifted MDA benchmark with improved memory
efficiency
- …