1,651 research outputs found

    Bayesian spatial analysis of demographic survey data

    Get PDF
    In this paper we analyze the spatial patterns of the risk of unprotected sexual intercourse for Italian women during their initial experience with sexual intercourse. We rely on geo-referenced survey data from the Italian Fertility and Family Survey, and we use a Bayesian approach relying on weakly informative prior distributions. Our analyses are based on a logistic regression model with a multilevel structure. The spatial pattern uses an intrinsic Gaussian conditional autoregressive (CAR) error component. The complexity of such a model is best handled within a Bayesian framework, and statistical inference is carried out using Markov Chain Monte Carlo simulation. In contrast with previous analyses based on multilevel model, our approach avoids the restrictive assumption of independence between area effects. This model allows us to borrow strength from neighbors in order to obtain estimates for areas that may, on their own, have inadequate sample sizes. We show that substantial geographical variation exists within Italy (Southern Italy has higher risks of unprotected first-time sexual intercourse). The findings are robust with respect to the specification of the prior distribution. We argue that spatial analysis can give useful insights on unmet reproductive health needs.contraceptive use, FFS, hierarchical Bayesian modeling, Italy, Monte Carlo Markov Chain, multilevel statistical models, spatial statistical demography

    Bayesian spatial analysis of demographic survey data: an application to contraceptive use at first sexual intercourse

    Get PDF
    In this paper we analyze the spatial patterns of the risk of unprotected sexual intercourse for Italian women during their initial experience with sexual intercourse. We rely on geo-referenced survey data from the Italian Fertility and Family Survey, and we use a Bayesian approach relying on weakly informative prior distributions. Our analyses are based on a logistic regression model with a multilevel structure. The spatial pattern uses an intrinsic Gaussian conditional autoregressive (CAR) error component. The complexity of such a model is best handled within a Bayesian framework, and statistical inference is carried out using Markov Chain Monte Carlo simulation. In contrast with previous analyses based on multilevel model, our approach avoids the restrictive assumption of independence between area effects. This model allows us to borrow strength from neighbors in order to obtain estimates for areas that may, on their own, have inadequate sample sizes. We show that substantial geographical variation exists within Italy (Southern Italy has higher risks of unprotected first-time sexual intercourse), and that the spatial pattern is stable across birth cohorts. The findings are robust with respect to the specification of the prior distribution. We argue that spatial analysis can give useful insights on unmet reproductive health needs. (KEYWORDS: spatial statistical demography, contraceptive use, hierarchical Bayesian modeling, Monte Carlo Markov Chain, multilevel statistical models, Italy, FFS)Italy, contraceptive usage

    Combining spatial information sources while accounting for systematic errors in proxies

    Full text link
    Environmental research increasingly uses high-dimensional remote sensing and numerical model output to help fill space-time gaps between traditional observations. Such output is often a noisy proxy for the process of interest. Thus one needs to separate and assess the signal and noise (often called discrepancy) in the proxy given complicated spatio-temporal dependencies. Here I extend a popular two-likelihood hierarchical model using a more flexible representation for the discrepancy. I employ the little-used Markov random field approximation to a thin plate spline, which can capture small-scale discrepancy in a computationally efficient manner while better modeling smooth processes than standard conditional auto-regressive models. The increased flexibility reduces identifiability, but the lack of identifiability is inherent in the scientific context. I model particulate matter air pollution using satellite aerosol and atmospheric model output proxies. The estimated discrepancies occur at a variety of spatial scales, with small-scale discrepancy particularly important. The examples indicate little predictive improvement over modeling the observations alone. Similarly, in simulations with an informative proxy, the presence of discrepancy and resulting identifiability issues prevent improvement in prediction. The results highlight but do not resolve the critical question of how best to use proxy information while minimizing the potential for proxy-induced error.Comment: 5 figures, 2 table

    Statistical modelling of categorical data under ontic and epistemic imprecision

    Get PDF

    Distinguishing cause from effect using observational data: methods and benchmarks

    Get PDF
    The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning Researc
    • …
    corecore