396 research outputs found

    Distinguishing cause from effect using observational data: methods and benchmarks

    Get PDF
    The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning Researc

    A guided hybrid genetic algorithm for feature selection with expensive cost functions

    Get PDF
    We present a guided hybrid genetic algorithm for feature selection which is tailored to minimize the number of cost function evaluations. Guided variable elimination is used to make the stochastic backward search of the genetic algorithm much more efficient. Guiding means that a promising feature set is selected from a population and suggestions (for example by a trained Random Forest) are made which variable could be removed. It uses implicit diversity management and is able to return multiple optimal solutions if present, which might be important for interpreting the results. It uses a dynamic cost function that avoids prescribing an expected upper limit of performance or the number of features of the optimal solution. We illustrate the performance of the algorithm on artificial data, and show that the algorithm provides accurate results and is very efficient in minimizing the number of cost function evaluations.We present a guided hybrid genetic algorithm for feature selection which is tailored to minimize the number of cost function evaluations. Guided variable elimination is used to make the stochastic backward search of the genetic algorithm much more efficient. Guiding means that a promising feature set is selected from a population and suggestions (for example by a trained Random Forest) are made which variable could be removed. It uses implicit diversity management and is able to return multiple optimal solutions if present, which might be important for interpreting the results. It uses a dynamic cost function that avoids prescribing an expected upper limit of performance or the number of features of the optimal solution. We illustrate the performance of the algorithm on artificial data, and show that the algorithm provides accurate results and is very efficient in minimizing the number of cost function evaluations

    The effect of univariate bias adjustment on multivariate hazard estimates

    Get PDF
    Bias adjustment is often a necessity in estimating climate impacts because impact models usually rely on unbiased climate information, a requirement that climate model outputs rarely fulfil. Most currently used statistical bias-adjustment methods adjust each climate variable separately, even though impacts usually depend on multiple potentially dependent variables. Human heat stress, for instance, depends on temperature and relative humidity, two variables that are often strongly correlated. Whether univariate bias-adjustment methods effectively improve estimates of impacts that depend on multiple drivers is largely unknown, and the lack of long-term impact data prevents a direct comparison between model outputs and observations for many climate-related impacts. Here we use two hazard indicators, heat stress and a simple fire risk indicator, as proxies for more sophisticated impact models. We show that univariate bias-adjustment methods such as univariate quantile mapping often cannot effectively reduce biases in multivariate hazard estimates. In some cases, it even increases biases. These cases typically occur (i) when hazards depend equally strongly on more than one climatic driver, (ii) when models exhibit biases in the dependence structure of drivers and (iii) when univariate biases are relatively small. Using a perfect model approach, we further quantify the uncertainty in bias-adjusted hazard indicators due to internal variability and show how imperfect bias adjustment can amplify this uncertainty. Both issues can be addressed successfully with a statistical bias adjustment that corrects the multivariate dependence structure in addition to the marginal distributions of the climate drivers. Our results suggest that currently many modeled climate impacts are associated with uncertainties related to the choice of bias adjustment. We conclude that in cases where impacts depend on multiple dependent climate variables these uncertainties can be reduced using statistical bias-adjustment approaches that correct the variables' multivariate dependence structure.</p

    Testing whether linear equations are causal: A free probability theory approach

    Full text link
    We propose a method that infers whether linear relations between two high-dimensional variables X and Y are due to a causal influence from X to Y or from Y to X. The earlier proposed so-called Trace Method is extended to the regime where the dimension of the observed variables exceeds the sample size. Based on previous work, we postulate conditions that characterize a causal relation between X and Y. Moreover, we describe a statistical test and argue that both causal directions are typically rejected if there is a common cause. A full theoretical analysis is presented for the deterministic case but our approach seems to be valid for the noisy case, too, for which we additionally present an approach based on a sparsity constraint. The discussed method yields promising results for both simulated and real world data

    A submonthly database for detecting changes in vegetation-atmosphere coupling

    No full text
    Land-atmosphere coupling and changes in coupling regimes are important for making precise future climate predictions and understanding vegetation-climate feedbacks. Here we introduce the Vegetation-Atmosphere Coupling (VAC) index which identifies regions and times of concurrent strong anomalies in temperature and photosynthetic activity. The different classes of the index determine whether a location is currently in an energy-limited or water-limited regime, and its high temporal resolution allows to investigate how these regimes change over time at the regional scale. We show that the VAC index helps to distinguish different evaporative regimes. It can therefore provide indirect information about the local soil moisture state. We further demonstrate how the index can be used to understand processes leading to and occurring during extreme climate events, using the 2010 heat wave in Russia and the 2010 Amazon drought as examples

    Bioeconomic fiction between narrative dynamics and a fixed imaginary: evidence from India and Germany

    Get PDF
    Bioeconomic ideas and visions have received increasing attention from scientists and policy makers to address socioecological challenges. However, the role of imagined futures in the design of bioeconomic innovations and transitions has hitherto been widely neglected. In this study, we therefore explore the role of imaginaries of the future to understand how they shape bioeconomic innovations and transitions. We thereby build on insights from economic sociology and compare two distinct case studies from Germany and India. Based on our results, we inductively develop an analytic model that describes the co-constitution of imaginaries, fictional expectations, narratives, and innovation dynamics. Our results show that narrative dynamics are caused by irritations in the political and discursive landscape; these irritations prompt economic actors to stabilize, adapt, or reject their own bioeconomic conceptions, while the underlying imaginary of a technological fix remains fixed. We discuss this reductionist imaginary and instead plead for an imaginary of a socioecological fix that reintertwines technologies with their underlying societal, cultural, and ecological factors. We conclude that this will support sustainability scholars and policy makers in remaining vigilant against premature mental and institutional lock-ins that could lead to a colonization of the future with severe negative implications for society's ability to mitigate and adapt to global environmental change in the future

    On the links between sub-seasonal clustering of extreme precipitation and high discharge in Switzerland and Europe

    Get PDF
    River discharge is impacted by the sub-seasonal (weekly to monthly) temporal structure of precipitation. One example is the successive occurrence of extreme precipitation events over sub-seasonal timescales, referred to as temporal clustering. Its potential effects on discharge have received little attention. Here, we address this topic by analysing discharge observations following extreme precipitation events either clustered in time or occurring in isolation. We rely on two sets of precipitation and discharge data, one centred on Switzerland and the other over Europe. We identify “clustered” extreme precipitation events based on the previous occurrence of another extreme precipitation within a given time window. We find that clustered events are generally followed by a more prolonged discharge response with a larger amplitude. The probability of exceeding the 95th discharge percentile in 5 d following an extreme precipitation event is in particular up to twice as high for situations where another extreme precipitation event occurred in the preceding week compared to isolated extreme precipitation events. The influence of temporal clustering on discharge decreases as the clustering window increases; beyond 6–8 weeks the difference in discharge response with non-clustered events is negligible. Catchment area, streamflow regime and precipitation magnitude also modulate the response. The impact of clustering is generally smaller in snow-dominated and large catchments. Additionally, particularly persistent periods of high discharge tend to occur in conjunction with temporal clusters of precipitation extremes

    Insights into the drivers and spatio-temporal trends of extreme Mediterranean wildfires with statistical deep-learning

    Full text link
    Extreme wildfires are a significant cause of human death and biodiversity destruction within countries that encompass the Mediterranean Basin. Recent worrying trends in wildfire activity (i.e., occurrence and spread) suggest that wildfires are likely to be highly impacted by climate change. In order to facilitate appropriate risk mitigation, we must identify the main drivers of extreme wildfires and assess their spatio-temporal trends, with a view to understanding the impacts of global warming on fire activity. We analyse the monthly burnt area due to wildfires over a region encompassing most of Europe and the Mediterranean Basin from 2001 to 2020, and identify high fire activity during this period in Algeria, Italy and Portugal. We build an extreme quantile regression model with a high-dimensional predictor set describing meteorological conditions, land cover usage, and orography. To model the complex relationships between the predictor variables and wildfires, we use a hybrid statistical deep-learning framework that can disentangle the effects of vapour-pressure deficit (VPD), air temperature, and drought on wildfire activity. Our results highlight that whilst VPD, air temperature, and drought significantly affect wildfire occurrence, only VPD affects wildfire spread. To gain insights into the effect of climate trends on wildfires in the near future, we focus on August 2001 and perturb temperature according to its observed trends (median over Europe: +0.04K per year). We find that, on average over Europe, these trends lead to a relative increase of 17.1\% and 1.6\% in the expected frequency and severity, respectively, of wildfires in August 2001, with spatially non-uniform changes in both aspects
    • …
    corecore