26 research outputs found

    Bayesian Modeling of Presence-only Data

    Get PDF
    This thesis develops models and methods for statistical analysis of presence-only data. Besides constructing new models, the emphasis is on the theoretical characteristics of new models and on Bayesian prediction. Monte Carlo Markov chains algorithms are developed for the new presence-only data models in order to be able to simulate the posterior distribution of the unknowns and the predictive distribution of variable of interest. The new methods are applied to simulated data. One application in ecologic science have been a driving force behind the work

    Bayesian logistic regression for presence-only data

    Get PDF
    Presence-only data are referred to situations in which a censoring mechanism acts on a binary response which can be partially observed only with respect to one outcome, usually denoting the \textit{presence} of an attribute of interest. A typical example is the recording of species presence in ecological surveys. In this work a Bayesian approach to the analysis of presence-only data based on a two levels scheme is presented. A probability law and a case-control design are combined to handle the double source of uncertainty: one due to censoring and the other one due to sampling. In the paper, through the use of a stratified sampling design with non-overlapping strata, a new formulation of the logistic model for presence-only data is proposed. In particular, the logistic regression with linear predictor is considered. Estimation is carried out with a new Markov Chain Monte Carlo algorithm with data augmentation, which does not require the a priori knowledge of the population prevalence. The performance of the new algorithm is validated by means of extensive simulation experiments using three scenarios and comparison with optimal benchmarks. An application to data existing in literature is reported in order to discuss the model behaviour in real world situations together with the results of an original study on termites occurrences data

    Bayesian Modeling and MCMC Computation in Linear Logistic Regression for Presence-only Data

    Full text link
    Presence-only data are referred to situations in which, given a censoring mechanism, a binary response can be observed only with respect to on outcome, usually called \textit{presence}. In this work we present a Bayesian approach to the problem of presence-only data based on a two levels scheme. A probability law and a case-control design are combined to handle the double source of uncertainty: one due to the censoring and one due to the sampling. We propose a new formalization for the logistic model with presence-only data that allows further insight into inferential issues related to the model. We concentrate on the case of the linear logistic regression and, in order to make inference on the parameters of interest, we present a Markov Chain Monte Carlo algorithm with data augmentation that does not require the a priori knowledge of the population prevalence. A simulation study concerning 24,000 simulated datasets related to different scenarios is presented comparing our proposal to optimal benchmarks.Comment: Affiliations: Fabio Divino - Division of Physics, Computer Science and Mathematics, University of Molise Giovanna jona Lasinio and Natalia Golini - Department of Statistical Sciences, University of Rome "La Sapienza" Antti Penttinen - Department of Mathematics and Statistics, University of Jyv\"{a}skyl\"{a} CONTACT: [email protected], [email protected]

    Functional zoning of biodiversity profiles

    Full text link
    Spatial mapping of biodiversity is crucial to investigate spatial variations in natural communities. Several indices have been proposed in the literature to represent biodiversity as a single statistic. However, these indices only provide information on individual dimensions of biodiversity, thus failing to grasp its complexity comprehensively. Consequently, relying solely on these single indices can lead to misleading conclusions about the actual state of biodiversity. In this work, we focus on biodiversity profiles, which provide a more flexible framework to express biodiversity through non-negative and convex curves, which can be analyzed by means of functional data analysis. By treating the whole curves as single entities, we propose to achieve a functional zoning of the region of interest by means of a penalized model-based clustering procedure. This provides a spatial clustering of the biodiversity profiles, which is useful for policy-makers both for conserving and managing natural resources and revealing patterns of interest. Our approach is discussed through the analysis of Harvard Forest Data, which provides information on the spatial distribution of woody stems within a plot of the Harvard Forest

    Agrimonia: a dataset on livestock, meteorology and air quality in the Lombardy region, Italy

    Get PDF
    The air in the Lombardy region, Italy, is one of the most polluted in Europe because of limited air circulation and high emission levels. There is a large scientific consensus that the agricultural sector has a significant impact on air quality. To support studies quantifying the role of the agricultural and livestock sectors on the Lombardy air quality, this paper presents a harmonised dataset containing daily values of air quality, weather, emissions, livestock, and land and soil use in the years 2016–2021, for the Lombardy region. The daily scale is obtained by averaging hourly data and interpolating other variables. In fact, the pollutant data come from the European Environmental Agency and the Lombardy Regional Environment Protection Agency, weather and emissions data from the European Copernicus programme, livestock data from the Italian zootechnical registry, and land and soil use data from the CORINE Land Cover project. The resulting dataset is designed to be used as is by those using air quality data for research

    Spatiotemporal modelling of PM2.5_{2.5} concentrations in Lombardy (Italy) -- A comparative study

    Full text link
    This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models (GAMM), and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting PM2.5_{2.5} concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and PM2.5_{2.5} concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches
    corecore