396 research outputs found
Distinguishing cause from effect using observational data: methods and benchmarks
The discovery of causal relationships from purely observational data is a
fundamental problem in science. The most elementary form of such a causal
discovery problem is to decide whether X causes Y or, alternatively, Y causes
X, given joint observations of two variables X, Y. An example is to decide
whether altitude causes temperature, or vice versa, given only joint
measurements of both variables. Even under the simplifying assumptions of no
confounding, no feedback loops, and no selection bias, such bivariate causal
discovery problems are challenging. Nevertheless, several approaches for
addressing those problems have been proposed in recent years. We review two
families of such methods: Additive Noise Methods (ANM) and Information
Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs
that consists of data for 100 different cause-effect pairs selected from 37
datasets from various domains (e.g., meteorology, biology, medicine,
engineering, economy, etc.) and motivate our decisions regarding the "ground
truth" causal directions of all pairs. We evaluate the performance of several
bivariate causal discovery methods on these real-world benchmark data and in
addition on artificially simulated data. Our empirical results on real-world
data indicate that certain methods are indeed able to distinguish cause from
effect using only purely observational data, although more benchmark data would
be needed to obtain statistically significant conclusions. One of the best
performing methods overall is the additive-noise method originally proposed by
Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of
0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of
this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning
Researc
A guided hybrid genetic algorithm for feature selection with expensive cost functions
We present a guided hybrid genetic algorithm for feature selection which is tailored to minimize the number of cost function evaluations. Guided variable elimination is used to make the stochastic backward search of the genetic algorithm much more efficient. Guiding means that a promising feature set is selected from a population and suggestions (for example by a trained Random Forest) are made which variable could be removed. It uses implicit diversity management and is able to return multiple optimal solutions if present, which might be important for interpreting the results. It uses a dynamic cost function that avoids prescribing an expected upper limit of performance or the number of features of the optimal solution. We illustrate the performance of the algorithm on artificial data, and show that the algorithm provides accurate results and is very efficient in minimizing the number of cost function evaluations.We present a guided hybrid genetic algorithm for feature selection which is tailored to minimize the number of cost function evaluations. Guided variable elimination is used to make the stochastic backward search of the genetic algorithm much more efficient. Guiding means that a promising feature set is selected from a population and suggestions (for example by a trained Random Forest) are made which variable could be removed. It uses implicit diversity management and is able to return multiple optimal solutions if present, which might be important for interpreting the results. It uses a dynamic cost function that avoids prescribing an expected upper limit of performance or the number of features of the optimal solution. We illustrate the performance of the algorithm on artificial data, and show that the algorithm provides accurate results and is very efficient in minimizing the number of cost function evaluations
The effect of univariate bias adjustment on multivariate hazard estimates
Bias adjustment is often a necessity in estimating climate
impacts because impact models usually rely on unbiased climate information, a requirement
that climate model outputs rarely fulfil. Most currently used statistical bias-adjustment methods
adjust each climate variable separately, even though impacts usually depend on multiple
potentially dependent variables. Human heat stress, for instance, depends on temperature
and relative humidity, two variables that are often strongly correlated. Whether
univariate bias-adjustment methods effectively improve estimates of impacts that depend
on multiple drivers is largely unknown, and the lack of long-term impact data prevents a
direct comparison between model outputs and observations for many climate-related
impacts. Here we use two hazard indicators, heat stress and a simple fire risk indicator,
as proxies for more sophisticated impact models. We show that univariate bias-adjustment
methods such as univariate quantile mapping often cannot effectively reduce biases in
multivariate hazard estimates. In some cases, it even increases biases. These cases
typically occur (i)Â when hazards depend equally strongly on more than one climatic
driver, (ii)Â when models exhibit biases in the dependence structure of drivers and
(iii)Â when univariate biases are relatively small. Using a perfect model approach, we
further quantify the uncertainty in bias-adjusted hazard indicators due to internal
variability and show how imperfect bias adjustment can amplify this uncertainty. Both
issues can be addressed successfully with a statistical bias adjustment that corrects the
multivariate dependence structure in addition to the marginal distributions of the
climate drivers. Our results suggest that currently many modeled climate impacts are
associated with uncertainties related to the choice of bias adjustment. We conclude that
in cases where impacts depend on multiple dependent climate variables these uncertainties
can be reduced using statistical bias-adjustment approaches that correct the variables'
multivariate dependence structure.</p
Testing whether linear equations are causal: A free probability theory approach
We propose a method that infers whether linear relations between two
high-dimensional variables X and Y are due to a causal influence from X to Y or
from Y to X. The earlier proposed so-called Trace Method is extended to the
regime where the dimension of the observed variables exceeds the sample size.
Based on previous work, we postulate conditions that characterize a causal
relation between X and Y. Moreover, we describe a statistical test and argue
that both causal directions are typically rejected if there is a common cause.
A full theoretical analysis is presented for the deterministic case but our
approach seems to be valid for the noisy case, too, for which we additionally
present an approach based on a sparsity constraint. The discussed method yields
promising results for both simulated and real world data
A submonthly database for detecting changes in vegetation-atmosphere coupling
Land-atmosphere coupling and changes in coupling regimes are important for making precise future climate predictions and understanding vegetation-climate feedbacks. Here we introduce the Vegetation-Atmosphere Coupling (VAC) index which identifies regions and times of concurrent strong anomalies in temperature and photosynthetic activity. The different classes of the index determine whether a location is currently in an energy-limited or water-limited regime, and its high temporal resolution allows to investigate how these regimes change over time at the regional scale. We show that the VAC index helps to distinguish different evaporative regimes. It can therefore provide indirect information about the local soil moisture state. We further demonstrate how the index can be used to understand processes leading to and occurring during extreme climate events, using the 2010 heat wave in Russia and the 2010 Amazon drought as examples
Bioeconomic fiction between narrative dynamics and a fixed imaginary: evidence from India and Germany
Bioeconomic ideas and visions have received increasing attention from scientists and policy makers to address socioecological challenges. However, the role of imagined futures in the design of bioeconomic innovations and transitions has hitherto been widely neglected. In this study, we therefore explore the role of imaginaries of the future to understand how they shape bioeconomic innovations and transitions. We thereby build on insights from economic sociology and compare two distinct case studies from Germany and India. Based on our results, we inductively develop an analytic model that describes the co-constitution of imaginaries, fictional expectations, narratives, and innovation dynamics. Our results show that narrative dynamics are caused by irritations in the political and discursive landscape; these irritations prompt economic actors to stabilize, adapt, or reject their own bioeconomic conceptions, while the underlying imaginary of a technological fix remains fixed. We discuss this reductionist imaginary and instead plead for an imaginary of a socioecological fix that reintertwines technologies with their underlying societal, cultural, and ecological factors. We conclude that this will support sustainability scholars and policy makers in remaining vigilant against premature mental and institutional lock-ins that could lead to a colonization of the future with severe negative implications for society's ability to mitigate and adapt to global environmental change in the future
On the links between sub-seasonal clustering of extreme precipitation and high discharge in Switzerland and Europe
River discharge is impacted by the sub-seasonal (weekly to monthly) temporal structure of precipitation. One example is the successive occurrence of extreme precipitation events over sub-seasonal timescales, referred to as temporal clustering. Its potential effects on discharge have received little attention. Here, we address this topic by analysing discharge observations following extreme precipitation events either clustered in time or occurring in isolation. We rely on two sets of precipitation and discharge data, one centred on Switzerland and the other over Europe. We identify “clustered” extreme precipitation events based on the previous occurrence of another extreme precipitation within a given time window. We find that clustered events are generally followed by a more prolonged discharge response with a larger amplitude. The probability of exceeding the 95th discharge percentile in 5 d following an extreme precipitation event is in particular up to twice as high for situations where another extreme precipitation event occurred in the preceding week compared to isolated extreme precipitation events. The influence of temporal clustering on discharge decreases as the clustering window increases; beyond 6–8 weeks the difference in discharge response with non-clustered events is negligible. Catchment area, streamflow regime and precipitation magnitude also modulate the response. The impact of clustering is generally smaller in snow-dominated and large catchments. Additionally, particularly persistent periods of high discharge tend to occur in conjunction with temporal clusters of precipitation extremes
Insights into the drivers and spatio-temporal trends of extreme Mediterranean wildfires with statistical deep-learning
Extreme wildfires are a significant cause of human death and biodiversity
destruction within countries that encompass the Mediterranean Basin. Recent
worrying trends in wildfire activity (i.e., occurrence and spread) suggest that
wildfires are likely to be highly impacted by climate change. In order to
facilitate appropriate risk mitigation, we must identify the main drivers of
extreme wildfires and assess their spatio-temporal trends, with a view to
understanding the impacts of global warming on fire activity. We analyse the
monthly burnt area due to wildfires over a region encompassing most of Europe
and the Mediterranean Basin from 2001 to 2020, and identify high fire activity
during this period in Algeria, Italy and Portugal. We build an extreme quantile
regression model with a high-dimensional predictor set describing
meteorological conditions, land cover usage, and orography. To model the
complex relationships between the predictor variables and wildfires, we use a
hybrid statistical deep-learning framework that can disentangle the effects of
vapour-pressure deficit (VPD), air temperature, and drought on wildfire
activity. Our results highlight that whilst VPD, air temperature, and drought
significantly affect wildfire occurrence, only VPD affects wildfire spread. To
gain insights into the effect of climate trends on wildfires in the near
future, we focus on August 2001 and perturb temperature according to its
observed trends (median over Europe: +0.04K per year). We find that, on average
over Europe, these trends lead to a relative increase of 17.1\% and 1.6\% in
the expected frequency and severity, respectively, of wildfires in August 2001,
with spatially non-uniform changes in both aspects
- …