16,862 research outputs found
Distinguishing cause from effect using observational data: methods and benchmarks
The discovery of causal relationships from purely observational data is a
fundamental problem in science. The most elementary form of such a causal
discovery problem is to decide whether X causes Y or, alternatively, Y causes
X, given joint observations of two variables X, Y. An example is to decide
whether altitude causes temperature, or vice versa, given only joint
measurements of both variables. Even under the simplifying assumptions of no
confounding, no feedback loops, and no selection bias, such bivariate causal
discovery problems are challenging. Nevertheless, several approaches for
addressing those problems have been proposed in recent years. We review two
families of such methods: Additive Noise Methods (ANM) and Information
Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs
that consists of data for 100 different cause-effect pairs selected from 37
datasets from various domains (e.g., meteorology, biology, medicine,
engineering, economy, etc.) and motivate our decisions regarding the "ground
truth" causal directions of all pairs. We evaluate the performance of several
bivariate causal discovery methods on these real-world benchmark data and in
addition on artificially simulated data. Our empirical results on real-world
data indicate that certain methods are indeed able to distinguish cause from
effect using only purely observational data, although more benchmark data would
be needed to obtain statistically significant conclusions. One of the best
performing methods overall is the additive-noise method originally proposed by
Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of
0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of
this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning
Researc
A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields
Storm surge, the onshore rush of sea water caused by the high winds and low
pressure associated with a hurricane, can compound the effects of inland
flooding caused by rainfall, leading to loss of property and loss of life for
residents of coastal areas. Numerical ocean models are essential for creating
storm surge forecasts for coastal areas. These models are driven primarily by
the surface wind forcings. Currently, the gridded wind fields used by ocean
models are specified by deterministic formulas that are based on the central
pressure and location of the storm center. While these equations incorporate
important physical knowledge about the structure of hurricane surface wind
fields, they cannot always capture the asymmetric and dynamic nature of a
hurricane. A new Bayesian multivariate spatial statistical modeling framework
is introduced combining data with physical knowledge about the wind fields to
improve the estimation of the wind vectors. Many spatial models assume the data
follow a Gaussian distribution. However, this may be overly-restrictive for
wind fields data which often display erratic behavior, such as sudden changes
in time or space. In this paper we develop a semiparametric multivariate
spatial model for these data. Our model builds on the stick-breaking prior,
which is frequently used in Bayesian modeling to capture uncertainty in the
parametric form of an outcome. The stick-breaking prior is extended to the
spatial setting by assigning each location a different, unknown distribution,
and smoothing the distributions in space with a series of kernel functions.
This semiparametric spatial model is shown to improve prediction compared to
usual Bayesian Kriging methods for the wind field of Hurricane Ivan.Comment: Published at http://dx.doi.org/10.1214/07-AOAS108 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Locating and quantifying gas emission sources using remotely obtained concentration data
We describe a method for detecting, locating and quantifying sources of gas
emissions to the atmosphere using remotely obtained gas concentration data; the
method is applicable to gases of environmental concern. We demonstrate its
performance using methane data collected from aircraft. Atmospheric point
concentration measurements are modelled as the sum of a spatially and
temporally smooth atmospheric background concentration, augmented by
concentrations due to local sources. We model source emission rates with a
Gaussian mixture model and use a Markov random field to represent the
atmospheric background concentration component of the measurements. A Gaussian
plume atmospheric eddy dispersion model represents gas dispersion between
sources and measurement locations. Initial point estimates of background
concentrations and source emission rates are obtained using mixed L2-L1
optimisation over a discretised grid of potential source locations. Subsequent
reversible jump Markov chain Monte Carlo inference provides estimated values
and uncertainties for the number, emission rates and locations of sources
unconstrained by a grid. Source area, atmospheric background concentrations and
other model parameters are also estimated. We investigate the performance of
the approach first using a synthetic problem, then apply the method to real
data collected from an aircraft flying over: a 1600 km^2 area containing two
landfills, then a 225 km^2 area containing a gas flare stack
A simple state-based prognostic model for filter clogging
In today's maintenance planning, fuel filters are replaced or cleaned on a regular basis. Monitoring and implementation of prognostics on filtration system have the potential to avoid costs and increase safety. Prognostics is a fundamental technology within Integrated Vehicle Health Management (IVHM). Prognostic models can be categorised into three major categories: 1) Physics-based models 2) Data-driven models 3) Experience-based models. One of the challenges in the progression of the clogging filter failure is the inability to observe the natural clogging filter failure due to time constraint. This paper presents a simple solution to collect data for a clogging filter failure. Also, it represents a simple state-based prognostic with duration information (SSPD) method that aims to detect and forecast clogging of filter in a laboratory based fuel rig system. The progression of the clogging filter failure is created unnaturally. The degradation level is divided into several groups. Each group is defined as a state in the failure progression of clogging filter. Then, the data is collected to create the clogging filter progression states unnaturally. The SSPD method consists of three steps: clustering, clustering evaluation, and remaining useful life (RUL) estimation. Prognosis results show that the SSPD method is able to predicate the RUL of the clogging filter accurately
- …