183 research outputs found
Deep transfer learning for partial differential equations under conditional shift with DeepONet
Traditional machine learning algorithms are designed to learn in isolation,
i.e. address single tasks. The core idea of transfer learning (TL) is that
knowledge gained in learning to perform one task (source) can be leveraged to
improve learning performance in a related, but different, task (target). TL
leverages and transfers previously acquired knowledge to address the expense of
data acquisition and labeling, potential computational power limitations, and
the dataset distribution mismatches. Although significant progress has been
made in the fields of image processing, speech recognition, and natural
language processing (for classification and regression) for TL, little work has
been done in the field of scientific machine learning for functional regression
and uncertainty quantification in partial differential equations. In this work,
we propose a novel TL framework for task-specific learning under conditional
shift with a deep operator network (DeepONet). Inspired by the conditional
embedding operator theory, we measure the statistical distance between the
source domain and the target feature domain by embedding conditional
distributions onto a reproducing kernel Hilbert space. Task-specific operator
learning is accomplished by fine-tuning task-specific layers of the target
DeepONet using a hybrid loss function that allows for the matching of
individual target samples while also preserving the global properties of the
conditional distribution of target data. We demonstrate the advantages of our
approach for various TL scenarios involving nonlinear PDEs under conditional
shift. Our results include geometry domain adaptation and show that the
proposed TL framework enables fast and efficient multi-task operator learning,
despite significant differences between the source and target domains.Comment: 19 pages, 3 figure
Recommended from our members
Method for Enabling Causal Inference in Relational Domains
The analysis of data from complex systems is quickly becoming a fundamental aspect of modern business, government, and science. The field of causal learning is concerned with developing a set of statistical methods that allow practitioners make inferences about unseen interventions. This field has seen significant advances in recent years. However, the vast majority of this work assumes that data instances are independent, whereas many systems are best described in terms of interconnected instances, i.e. relational systems. This discrepancy prevents causal inference techniques from being reliably applied in many real-world settings. In this thesis, I will present three contributions to the field of causal inference that seek to enable the analysis of relational systems. First, I will present theory for consistently testing statistical dependence in relational domains. I then show how the significance of this test can be measured in practice using a novel bootstrap method for structured domains. Second, I show that statistical dependence in relational domains is inherently asymmetric, implying a simple test of causal direction from observational data. This test requires no assumptions on either the marginal distributions of variables or the functional form of dependence. Third, I describe relational causal adjustment, a procedure to identify the effects of arbitrary interventions from observational relational data via an extension of Pearl\u27s backdoor criterion. A series of evaluations on synthetic domains shows the estimates obtained by relational causal adjustment are close to those obtained from explicit experimentation
Causal Modeling with Stationary Diffusions
We develop a novel approach towards causal inference. Rather than structural
equations over a causal graph, we learn stochastic differential equations
(SDEs) whose stationary densities model a system's behavior under
interventions. These stationary diffusion models do not require the formalism
of causal graphs, let alone the common assumption of acyclicity. We show that
in several cases, they generalize to unseen interventions on their variables,
often better than classical approaches. Our inference method is based on a new
theoretical result that expresses a stationarity condition on the diffusion's
generator in a reproducing kernel Hilbert space. The resulting kernel deviation
from stationarity (KDS) is an objective function of independent interest
Probabilistic learning and computation in brains and machines
Humans and animals are able to solve a wide variety of perceptual, decision making and motor tasks with great exibility. Moreover, behavioural evidence shows that this exibility extends to situations where accuracy requires the correct treatment of uncertainty induced by noise and ambiguity in the available sensory information as well as noise internal to the brain. It has been suggested that this adequate handling of uncertainty is based on a learned internal model, e.g. in the case of perception, a generative model of sensory observations. Learning latent variable models and performing inference in them is a key challenge for both biological and arti cial learning systems. Here, we introduce a new approach to learning in hierarchical latent variable models called the Distributed Distributional Code Helmholtz Machine (DDC-HM), which emphasises exibility and accuracy in the inferential process. The approximate posterior over unobserved variables is represented implicitly as a set of expectations, corresponding to mean parameters of an exponential family distribution. To train the generative and recognition models we develop an extended wake-sleep algorithm inspired by the original Helmholtz Machine. As a result, the DDC-HM is able to learn hierarchical latent models without having to propagate gradients across di erent stochastic layers|making our approach biologically appealing. In the second part of the thesis, we review existing proposals for neural representations of uncertainty with a focus on representational and computational exibility as well as experimental support. Finally, we consider inference and learning in dynamical environment models using Distributed Distributional Codes to represent both the stochastic latent transition model and the inferred posterior distributions. We show that this model makes it possible to generalise successor representations to biologically more realistic, partially observed settings
Contributions in functional data analysis and functional-analytic statistics
Functional data analysis is the study of statistical algorithms which are applied in the scenario when the observed data is a collection of functions. Since this type of data is becoming cheaper and easier to collect, there is an increased need to develop statistical tools to handle such data. The first part of this thesis focuses on deriving distances between distributions over function spaces and applying these to two-sample testing, goodness-of-fit testing and sample quality assessment. This presents a wide range of contributions since currently there exists either very few or no methods at all to tackle these problems for functional data. The second part of this thesis adopts the functional-analytic perspective to two statistical algorithms. This is a perspective where functions are viewed as living in specific function spaces and the tool box of functional analysis is applied to identify and prove properties of the algorithms. The two algorithms are variational Gaussian processes, used widely throughout machine learning for function modelling with large observation data sets, and functional statistical depth, used widely as a means to evaluate outliers and perform testing for functional data sets. The results presented contribute a taxonomy of the variational Gaussian process methodology and multiple new results in the theory of functional depth including the open problem of providing a depth which characterises distributions on function spaces.Open Acces
Recommended from our members
Hypothesis testing and causal inference with heterogeneous medical data
Learning from data which associations hold and are likely to hold in the future is a fundamental part of scientific discovery. With increasingly heterogeneous data collection practices, exemplified by passively collected electronic health records or high-dimensional genetic data with only few observed samples, biases and spurious correlations are prevalent. These are called spurious because they do not contribute to the effect being studied. In this context, the modelling assumptions of existing statistical tests and causal inference methods are often found inadequate and their practical utility diminished even though these models are increasingly used as decision-support tools in practice. This thesis investigates how modern computational techniques may broaden the fields of hypothesis testing and causal inference to handle the subtleties of large heterogeneous data sets, as well as simultaneously improve the robustness and theoretical understanding of machine learning algorithms using insights from causality and statistics.
The first part of this thesis is concerned with hypothesis testing. We develop a framework for hypothesis testing on set-valued data, a representation that faithfully describes many real-world phenomena including patient biomarker trajectories in the hospital. Using similar techniques, we develop next a two-sample test for making inference on selection-biased data, in the sense that not all individuals are equally likely to be included in the study, a fact that biases tests if not accounted for and if the desideratum is to obtain conclusions that are generally applicable. We conclude this section with an investigation of conditional independence in high-dimensional data, such as found in gene expression data, and propose a test using generative adversarial networks. The second part of this thesis is concerned with causal inference and discovery, with a special focus on the influence of unobserved confounders that distort the observed associations between variables and yet may not be ruled out or adjusted for using data alone. We start by demonstrating that unobserved confounders may bias substantially the generalization performance of machine learning algorithms trained with conventional learning paradigms such as empirical risk minimization. Acknowledging this spurious effect, we develop a new learning principle inspired by causal insights that provably generalizes to test data sampled from a larger set of distributions different from the training distribution. In the last chapter we consider the influence of unobserved confounders for causal discovery. We show that with some assumptions on the type and influence on the nature of unobserved confounding one may develop provably consistent causal discovery algorithms, formulated as a solution to a continuous optimization program
GraphiT: Encoding Graph Structure in Transformers
We show that viewing graphs as sets of node features and incorporating
structural and positional information into a transformer architecture is able
to outperform representations learned with classical graph neural networks
(GNNs). Our model, GraphiT, encodes such information by (i) leveraging relative
positional encoding strategies in self-attention scores based on positive
definite kernels on graphs, and (ii) enumerating and encoding local
sub-structures such as paths of short length. We thoroughly evaluate these two
ideas on many classification and regression tasks, demonstrating the
effectiveness of each of them independently, as well as their combination. In
addition to performing well on standard benchmarks, our model also admits
natural visualization mechanisms for interpreting graph motifs explaining the
predictions, making it a potentially strong candidate for scientific
applications where interpretation is important. Code available at
https://github.com/inria-thoth/GraphiT
- …