16,862 research outputs found

    Distinguishing cause from effect using observational data: methods and benchmarks

    Get PDF
    The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning Researc

    A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields

    Full text link
    Storm surge, the onshore rush of sea water caused by the high winds and low pressure associated with a hurricane, can compound the effects of inland flooding caused by rainfall, leading to loss of property and loss of life for residents of coastal areas. Numerical ocean models are essential for creating storm surge forecasts for coastal areas. These models are driven primarily by the surface wind forcings. Currently, the gridded wind fields used by ocean models are specified by deterministic formulas that are based on the central pressure and location of the storm center. While these equations incorporate important physical knowledge about the structure of hurricane surface wind fields, they cannot always capture the asymmetric and dynamic nature of a hurricane. A new Bayesian multivariate spatial statistical modeling framework is introduced combining data with physical knowledge about the wind fields to improve the estimation of the wind vectors. Many spatial models assume the data follow a Gaussian distribution. However, this may be overly-restrictive for wind fields data which often display erratic behavior, such as sudden changes in time or space. In this paper we develop a semiparametric multivariate spatial model for these data. Our model builds on the stick-breaking prior, which is frequently used in Bayesian modeling to capture uncertainty in the parametric form of an outcome. The stick-breaking prior is extended to the spatial setting by assigning each location a different, unknown distribution, and smoothing the distributions in space with a series of kernel functions. This semiparametric spatial model is shown to improve prediction compared to usual Bayesian Kriging methods for the wind field of Hurricane Ivan.Comment: Published at http://dx.doi.org/10.1214/07-AOAS108 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Locating and quantifying gas emission sources using remotely obtained concentration data

    Full text link
    We describe a method for detecting, locating and quantifying sources of gas emissions to the atmosphere using remotely obtained gas concentration data; the method is applicable to gases of environmental concern. We demonstrate its performance using methane data collected from aircraft. Atmospheric point concentration measurements are modelled as the sum of a spatially and temporally smooth atmospheric background concentration, augmented by concentrations due to local sources. We model source emission rates with a Gaussian mixture model and use a Markov random field to represent the atmospheric background concentration component of the measurements. A Gaussian plume atmospheric eddy dispersion model represents gas dispersion between sources and measurement locations. Initial point estimates of background concentrations and source emission rates are obtained using mixed L2-L1 optimisation over a discretised grid of potential source locations. Subsequent reversible jump Markov chain Monte Carlo inference provides estimated values and uncertainties for the number, emission rates and locations of sources unconstrained by a grid. Source area, atmospheric background concentrations and other model parameters are also estimated. We investigate the performance of the approach first using a synthetic problem, then apply the method to real data collected from an aircraft flying over: a 1600 km^2 area containing two landfills, then a 225 km^2 area containing a gas flare stack

    A simple state-based prognostic model for filter clogging

    Get PDF
    In today's maintenance planning, fuel filters are replaced or cleaned on a regular basis. Monitoring and implementation of prognostics on filtration system have the potential to avoid costs and increase safety. Prognostics is a fundamental technology within Integrated Vehicle Health Management (IVHM). Prognostic models can be categorised into three major categories: 1) Physics-based models 2) Data-driven models 3) Experience-based models. One of the challenges in the progression of the clogging filter failure is the inability to observe the natural clogging filter failure due to time constraint. This paper presents a simple solution to collect data for a clogging filter failure. Also, it represents a simple state-based prognostic with duration information (SSPD) method that aims to detect and forecast clogging of filter in a laboratory based fuel rig system. The progression of the clogging filter failure is created unnaturally. The degradation level is divided into several groups. Each group is defined as a state in the failure progression of clogging filter. Then, the data is collected to create the clogging filter progression states unnaturally. The SSPD method consists of three steps: clustering, clustering evaluation, and remaining useful life (RUL) estimation. Prognosis results show that the SSPD method is able to predicate the RUL of the clogging filter accurately
    • …
    corecore