3,000 research outputs found
Estimating causal networks in biosphere–atmosphere interaction with the PCMCI approach
Local meteorological conditions and biospheric activity are tightly coupled. Understanding these links is an essential prerequisite for predicting the Earth system under climate change conditions. However, many empirical studies on the interaction between the biosphere and the atmosphere are based on correlative approaches that are not able to deduce causal paths, and only very few studies apply causal discovery methods. Here, we use a recently proposed causal graph discovery algorithm, which aims to reconstruct the causal dependency structure underlying a set of time series. We explore the potential of this method to infer temporal dependencies in biosphere-atmosphere interactions. Specifically we address the following questions: How do periodicity and heteroscedasticity influence causal detection rates, i.e. the detection of existing and non-existing links? How consistent are results for noise-contaminated data? Do results exhibit an increased information content that justifies the use of this causal-inference method? We explore the first question using artificial time series with well known dependencies that mimic real-world biosphere-atmosphere interactions. The two remaining questions are addressed jointly in two case studies utilizing observational data. Firstly, we analyse three replicated eddy covariance datasets from a Mediterranean ecosystem at half hourly time resolution allowing us to understand the impact of measurement uncertainties. Secondly, we analyse global NDVI time series (GIMMS 3g) along with gridded climate data to study large-scale climatic drivers of vegetation greenness. Overall, the results confirm the capacity of the causal discovery method to extract time-lagged linear dependencies under realistic settings. The violation of the method's assumptions increases the likelihood to detect false links. Nevertheless, we consistently identify interaction patterns in observational data. Our findings suggest that estimating a directed biosphere-atmosphere network at the ecosystem level can offer novel possibilities to unravel complex multi-directional interactions. Other than classical correlative approaches, our findings are constrained to a few meaningful set of relations which can be powerful insights for the evaluation of terrestrial ecosystem models
Non-Parametric Causality Detection: An Application to Social Media and Financial Data
According to behavioral finance, stock market returns are influenced by
emotional, social and psychological factors. Several recent works support this
theory by providing evidence of correlation between stock market prices and
collective sentiment indexes measured using social media data. However, a pure
correlation analysis is not sufficient to prove that stock market returns are
influenced by such emotional factors since both stock market prices and
collective sentiment may be driven by a third unmeasured factor. Controlling
for factors that could influence the study by applying multivariate regression
models is challenging given the complexity of stock market data. False
assumptions about the linearity or non-linearity of the model and inaccuracies
on model specification may result in misleading conclusions.
In this work, we propose a novel framework for causal inference that does not
require any assumption about the statistical relationships among the variables
of the study and can effectively control a large number of factors. We apply
our method in order to estimate the causal impact that information posted in
social media may have on stock market returns of four big companies. Our
results indicate that social media data not only correlate with stock market
returns but also influence them.Comment: Physica A: Statistical Mechanics and its Applications 201
Distinguishing cause from effect using observational data: methods and benchmarks
The discovery of causal relationships from purely observational data is a
fundamental problem in science. The most elementary form of such a causal
discovery problem is to decide whether X causes Y or, alternatively, Y causes
X, given joint observations of two variables X, Y. An example is to decide
whether altitude causes temperature, or vice versa, given only joint
measurements of both variables. Even under the simplifying assumptions of no
confounding, no feedback loops, and no selection bias, such bivariate causal
discovery problems are challenging. Nevertheless, several approaches for
addressing those problems have been proposed in recent years. We review two
families of such methods: Additive Noise Methods (ANM) and Information
Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs
that consists of data for 100 different cause-effect pairs selected from 37
datasets from various domains (e.g., meteorology, biology, medicine,
engineering, economy, etc.) and motivate our decisions regarding the "ground
truth" causal directions of all pairs. We evaluate the performance of several
bivariate causal discovery methods on these real-world benchmark data and in
addition on artificially simulated data. Our empirical results on real-world
data indicate that certain methods are indeed able to distinguish cause from
effect using only purely observational data, although more benchmark data would
be needed to obtain statistically significant conclusions. One of the best
performing methods overall is the additive-noise method originally proposed by
Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of
0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of
this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning
Researc
Understanding confounding effects in linguistic coordination: an information-theoretic approach
We suggest an information-theoretic approach for measuring stylistic
coordination in dialogues. The proposed measure has a simple predictive
interpretation and can account for various confounding factors through proper
conditioning. We revisit some of the previous studies that reported strong
signatures of stylistic accommodation, and find that a significant part of the
observed coordination can be attributed to a simple confounding effect - length
coordination. Specifically, longer utterances tend to be followed by longer
responses, which gives rise to spurious correlations in the other stylistic
features. We propose a test to distinguish correlations in length due to
contextual factors (topic of conversation, user verbosity, etc.) and
turn-by-turn coordination. We also suggest a test to identify whether stylistic
coordination persists even after accounting for length coordination and
contextual factors
Quantifying information transfer and mediation along causal pathways in complex systems
Measures of information transfer have become a popular approach to analyze
interactions in complex systems such as the Earth or the human brain from
measured time series. Recent work has focused on causal definitions of
information transfer excluding effects of common drivers and indirect
influences. While the former clearly constitutes a spurious causality, the aim
of the present article is to develop measures quantifying different notions of
the strength of information transfer along indirect causal paths, based on
first reconstructing the multivariate causal network (\emph{Tigramite}
approach). Another class of novel measures quantifies to what extent different
intermediate processes on causal paths contribute to an interaction mechanism
to determine pathways of causal information transfer. A rigorous mathematical
framework allows for a clear information-theoretic interpretation that can also
be related to the underlying dynamics as proven for certain classes of
processes. Generally, however, estimates of information transfer remain hard to
interpret for nonlinearly intertwined complex systems. But, if experiments or
mathematical models are not available, measuring pathways of information
transfer within the causal dependency structure allows at least for an
abstraction of the dynamics. The measures are illustrated on a climatological
example to disentangle pathways of atmospheric flow over Europe.Comment: 20 pages, 6 figure
Philosophy and the practice of Bayesian statistics
A substantial school in the philosophy of science identifies Bayesian
inference with inductive inference and even rationality as such, and seems to
be strengthened by the rise and practical success of Bayesian statistics. We
argue that the most successful forms of Bayesian statistics do not actually
support that particular philosophy but rather accord much better with
sophisticated forms of hypothetico-deductivism. We examine the actual role
played by prior distributions in Bayesian models, and the crucial aspects of
model checking and model revision, which fall outside the scope of Bayesian
confirmation theory. We draw on the literature on the consistency of Bayesian
updating and also on our experience of applied work in social science.
Clarity about these matters should benefit not just philosophy of science,
but also statistical practice. At best, the inductivist view has encouraged
researchers to fit and compare models without checking them; at worst,
theorists have actively discouraged practitioners from performing model
checking because it does not fit into their framework.Comment: 36 pages, 5 figures. v2: Fixed typo in caption of figure 1. v3:
Further typo fixes. v4: Revised in response to referee
- …