419 research outputs found
Kernel-based independence tests for causal structure learning on functional data
Measurements of systems taken along a continuous functional dimension, such
as time or space, are ubiquitous in many fields, from the physical and
biological sciences to economics and engineering.Such measurements can be
viewed as realisations of an underlying smooth process sampled over the
continuum. However, traditional methods for independence testing and causal
learning are not directly applicable to such data, as they do not take into
account the dependence along the functional dimension. By using specifically
designed kernels, we introduce statistical tests for bivariate, joint, and
conditional independence for functional variables. Our method not only extends
the applicability to functional data of the HSIC and its d-variate version
(d-HSIC), but also allows us to introduce a test for conditional independence
by defining a novel statistic for the CPT based on the HSCIC, with optimised
regularisation strength estimated through an evaluation rejection rate. Our
empirical results of the size and power of these tests on synthetic functional
data show good performance, and we then exemplify their application to several
constraint- and regression-based causal structure learning problems, including
both synthetic examples and real socio-economic data
Detecting and quantifying causal associations in large nonlinear time series datasets
Identifying causal relationships and quantifying their strength from observational time series data are key problems in disciplines dealing with complex dynamical systems such as the Earth system or the human body. Data-driven causal inference in such systems is challenging since datasets are often high dimensional and nonlinear with limited sample sizes. Here, we introduce a novel method that flexibly combines linear or nonlinear conditional independence tests with a causal discovery algorithm to estimate causal networks from large-scale time series datasets. We validate the method on time series of well-understood physical mechanisms in the climate system and the human heart and using large-scale synthetic datasets mimicking the typical properties of real-world data. The experiments demonstrate that our method outperforms state-of-the-art techniques in detection power, which opens up entirely new possibilities to discover and quantify causal networks from time series across a range of research fields
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
- …