1,651 research outputs found
Bayesian spatial analysis of demographic survey data
In this paper we analyze the spatial patterns of the risk of unprotected sexual intercourse for Italian women during their initial experience with sexual intercourse. We rely on geo-referenced survey data from the Italian Fertility and Family Survey, and we use a Bayesian approach relying on weakly informative prior distributions. Our analyses are based on a logistic regression model with a multilevel structure. The spatial pattern uses an intrinsic Gaussian conditional autoregressive (CAR) error component. The complexity of such a model is best handled within a Bayesian framework, and statistical inference is carried out using Markov Chain Monte Carlo simulation. In contrast with previous analyses based on multilevel model, our approach avoids the restrictive assumption of independence between area effects. This model allows us to borrow strength from neighbors in order to obtain estimates for areas that may, on their own, have inadequate sample sizes. We show that substantial geographical variation exists within Italy (Southern Italy has higher risks of unprotected first-time sexual intercourse). The findings are robust with respect to the specification of the prior distribution. We argue that spatial analysis can give useful insights on unmet reproductive health needs.contraceptive use, FFS, hierarchical Bayesian modeling, Italy, Monte Carlo Markov Chain, multilevel statistical models, spatial statistical demography
Bayesian spatial analysis of demographic survey data: an application to contraceptive use at first sexual intercourse
In this paper we analyze the spatial patterns of the risk of unprotected sexual intercourse for Italian women during their initial experience with sexual intercourse. We rely on geo-referenced survey data from the Italian Fertility and Family Survey, and we use a Bayesian approach relying on weakly informative prior distributions. Our analyses are based on a logistic regression model with a multilevel structure. The spatial pattern uses an intrinsic Gaussian conditional autoregressive (CAR) error component. The complexity of such a model is best handled within a Bayesian framework, and statistical inference is carried out using Markov Chain Monte Carlo simulation. In contrast with previous analyses based on multilevel model, our approach avoids the restrictive assumption of independence between area effects. This model allows us to borrow strength from neighbors in order to obtain estimates for areas that may, on their own, have inadequate sample sizes. We show that substantial geographical variation exists within Italy (Southern Italy has higher risks of unprotected first-time sexual intercourse), and that the spatial pattern is stable across birth cohorts. The findings are robust with respect to the specification of the prior distribution. We argue that spatial analysis can give useful insights on unmet reproductive health needs. (KEYWORDS: spatial statistical demography, contraceptive use, hierarchical Bayesian modeling, Monte Carlo Markov Chain, multilevel statistical models, Italy, FFS)Italy, contraceptive usage
Combining spatial information sources while accounting for systematic errors in proxies
Environmental research increasingly uses high-dimensional remote sensing and
numerical model output to help fill space-time gaps between traditional
observations. Such output is often a noisy proxy for the process of interest.
Thus one needs to separate and assess the signal and noise (often called
discrepancy) in the proxy given complicated spatio-temporal dependencies. Here
I extend a popular two-likelihood hierarchical model using a more flexible
representation for the discrepancy. I employ the little-used Markov random
field approximation to a thin plate spline, which can capture small-scale
discrepancy in a computationally efficient manner while better modeling smooth
processes than standard conditional auto-regressive models. The increased
flexibility reduces identifiability, but the lack of identifiability is
inherent in the scientific context. I model particulate matter air pollution
using satellite aerosol and atmospheric model output proxies. The estimated
discrepancies occur at a variety of spatial scales, with small-scale
discrepancy particularly important. The examples indicate little predictive
improvement over modeling the observations alone. Similarly, in simulations
with an informative proxy, the presence of discrepancy and resulting
identifiability issues prevent improvement in prediction. The results highlight
but do not resolve the critical question of how best to use proxy information
while minimizing the potential for proxy-induced error.Comment: 5 figures, 2 table
Distinguishing cause from effect using observational data: methods and benchmarks
The discovery of causal relationships from purely observational data is a
fundamental problem in science. The most elementary form of such a causal
discovery problem is to decide whether X causes Y or, alternatively, Y causes
X, given joint observations of two variables X, Y. An example is to decide
whether altitude causes temperature, or vice versa, given only joint
measurements of both variables. Even under the simplifying assumptions of no
confounding, no feedback loops, and no selection bias, such bivariate causal
discovery problems are challenging. Nevertheless, several approaches for
addressing those problems have been proposed in recent years. We review two
families of such methods: Additive Noise Methods (ANM) and Information
Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs
that consists of data for 100 different cause-effect pairs selected from 37
datasets from various domains (e.g., meteorology, biology, medicine,
engineering, economy, etc.) and motivate our decisions regarding the "ground
truth" causal directions of all pairs. We evaluate the performance of several
bivariate causal discovery methods on these real-world benchmark data and in
addition on artificially simulated data. Our empirical results on real-world
data indicate that certain methods are indeed able to distinguish cause from
effect using only purely observational data, although more benchmark data would
be needed to obtain statistically significant conclusions. One of the best
performing methods overall is the additive-noise method originally proposed by
Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of
0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of
this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning
Researc
- …