343 research outputs found
Causal Discovery from Subsampled Time Series Data by Constraint Optimization
This paper focuses on causal structure estimation from time series data in which measurements are obtained at a coarser timescale than the causal timescale of the underlying system. Previous work has shown that such subsampling can lead to significant errors about the system’s causal structure if not properly taken into account. In this paper, we first consider the search for the system timescale causal structures that correspond to a given measurement timescale structure. We provide a constraint satisfaction procedure whose computational performance is several orders of magnitude better than previous approaches. We then consider finite-sample data as input, and propose the first constraint optimization approach for recovering the system timescale causal structure. This algorithm optimally recovers from possible conflicts due to statistical errors. More generally, these advances allow for a robust and non-parametric estimation of system timescale causal structures from subsampled time series data
Causal discovery in a complex industrial system: A time series benchmark
Causal discovery outputs a causal structure, represented by a graph, from
observed data. For time series data, there is a variety of methods, however, it
is difficult to evaluate these on real data as realistic use cases very rarely
come with a known causal graph to which output can be compared. In this paper,
we present a dataset from an industrial subsystem at the European Spallation
Source along with its causal graph which has been constructed from expert
knowledge. This provides a testbed for causal discovery from time series
observations of complex systems, and we believe this can help inform the
development of causal discovery methodology.Comment: 18 pages, 9 figures, 1 tabl
Causal structure learning from time series: Large regression coefficients may predict causal links better in practice than small p-values
In this article, we describe the algorithms for causal structure learning
from time series data that won the Causality 4 Climate competition at the
Conference on Neural Information Processing Systems 2019 (NeurIPS). We examine
how our combination of established ideas achieves competitive performance on
semi-realistic and realistic time series data exhibiting common challenges in
real-world Earth sciences data. In particular, we discuss a) a rationale for
leveraging linear methods to identify causal links in non-linear systems, b) a
simulation-backed explanation as to why large regression coefficients may
predict causal links better in practice than small p-values and thus why
normalising the data may sometimes hinder causal structure learning.
For benchmark usage, we detail the algorithms here and provide
implementations at https://github.com/sweichwald/tidybench . We propose the
presented competition-proven methods for baseline benchmark comparisons to
guide the development of novel algorithms for structure learning from time
series
Causal Discovery from Temporal Data: An Overview and New Perspectives
Temporal data, representing chronological observations of complex systems,
has always been a typical data structure that can be widely generated by many
domains, such as industry, medicine and finance. Analyzing this type of data is
extremely valuable for various applications. Thus, different temporal data
analysis tasks, eg, classification, clustering and prediction, have been
proposed in the past decades. Among them, causal discovery, learning the causal
relations from temporal data, is considered an interesting yet critical task
and has attracted much research attention. Existing casual discovery works can
be divided into two highly correlated categories according to whether the
temporal data is calibrated, ie, multivariate time series casual discovery, and
event sequence casual discovery. However, most previous surveys are only
focused on the time series casual discovery and ignore the second category. In
this paper, we specify the correlation between the two categories and provide a
systematical overview of existing solutions. Furthermore, we provide public
datasets, evaluation metrics and new perspectives for temporal data casual
discovery.Comment: 52 pages, 6 figure
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods
Causal relationships are commonly examined in manufacturing processes to
support faults investigations, perform interventions, and make strategic
decisions. Industry 4.0 has made available an increasing amount of data that
enable data-driven Causal Discovery (CD). Considering the growing number of
recently proposed CD methods, it is necessary to introduce strict benchmarking
procedures on publicly available datasets since they represent the foundation
for a fair comparison and validation of different methods. This work introduces
two novel public datasets for CD in continuous manufacturing processes. The
first dataset employs the well-known Tennessee Eastman simulator for fault
detection and process control. The second dataset is extracted from an
ultra-processed food manufacturing plant, and it includes a description of the
plant, as well as multiple ground truths. These datasets are used to propose a
benchmarking procedure based on different metrics and evaluated on a wide
selection of CD algorithms. This work allows testing CD methods in realistic
conditions enabling the selection of the most suitable method for specific
target applications. The datasets are available at the following link:
https://github.com/giovanniMenComment: Supplementary Materials at:
https://github.com/giovanniMen/CPCaD-Benc
- …