Search CORE

26 research outputs found

How can we explain Random Forests in a spatial framework?

Author: Golini Natalia
Publication venue: Pearson
Publication date: 01/01/2023
Field of study

Institutional Research Information System University of Turin

Bayesian Modeling of Presence-only Data

Author: GOLINI NATALIA
Publication venue
Publication date: 01/01/2012
Field of study

This thesis develops models and methods for statistical analysis of presence-only data. Besides constructing new models, the emphasis is on the theoretical characteristics of new models and on Bayesian prediction. Monte Carlo Markov chains algorithms are developed for the new presence-only data models in order to be able to simulate the posterior distribution of the unknowns and the predictive distribution of variable of interest. The new methods are applied to simulated data. One application in ecologic science have been a driving force behind the work

Pubblicazioni Aperte Digitali Interateneo Sapienza

Archivio della ricerca- Università di Roma La Sapienza

Institutional Research Information System University of Turin

Bayesian logistic regression for presence-only data

Author: Antti Pettinen
Fabio Divino
JONA LASINIO Giovanna
Natalia Golini
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Presence-only data are referred to situations in which a censoring mechanism acts on a binary response which can be partially observed only with respect to one outcome, usually denoting the \textit{presence} of an attribute of interest. A typical example is the recording of species presence in ecological surveys. In this work a Bayesian approach to the analysis of presence-only data based on a two levels scheme is presented. A probability law and a case-control design are combined to handle the double source of uncertainty: one due to censoring and the other one due to sampling. In the paper, through the use of a stratified sampling design with non-overlapping strata, a new formulation of the logistic model for presence-only data is proposed. In particular, the logistic regression with linear predictor is considered. Estimation is carried out with a new Markov Chain Monte Carlo algorithm with data augmentation, which does not require the a priori knowledge of the population prevalence. The performance of the new algorithm is validated by means of extensive simulation experiments using three scenarios and comparison with optimal benchmarks. An application to data existing in literature is reported in order to discuss the model behaviour in real world situations together with the results of an original study on termites occurrences data

Università degli Studi del Molise: IRIS

Archivio della ricerca- Università di Roma La Sapienza

Bayesian Modeling and MCMC Computation in Linear Logistic Regression for Presence-only Data

Author: Divino Fabio
Golini Natalia
Lasinio Giovanna Jona
Penttinen Antti
Publication venue
Publication date: 06/05/2013
Field of study

Presence-only data are referred to situations in which, given a censoring mechanism, a binary response can be observed only with respect to on outcome, usually called \textit{presence}. In this work we present a Bayesian approach to the problem of presence-only data based on a two levels scheme. A probability law and a case-control design are combined to handle the double source of uncertainty: one due to the censoring and one due to the sampling. We propose a new formalization for the logistic model with presence-only data that allows further insight into inferential issues related to the model. We concentrate on the case of the linear logistic regression and, in order to make inference on the parameters of interest, we present a Markov Chain Monte Carlo algorithm with data augmentation that does not require the a priori knowledge of the population prevalence. A simulation study concerning 24,000 simulated datasets related to different scenarios is presented comparing our proposal to optimal benchmarks.Comment: Affiliations: Fabio Divino - Division of Physics, Computer Science and Mathematics, University of Molise Giovanna jona Lasinio and Natalia Golini - Department of Statistical Sciences, University of Rome "La Sapienza" Antti Penttinen - Department of Mathematics and Statistics, University of Jyv\"{a}skyl\"{a} CONTACT: [email protected], [email protected]

arXiv.org e-Print Archive

CiteSeerX

Big data and Official Statistics: some evidences

Author: Gianpiero Bianchi
Golini Natalia
Paolo Righi
Publication venue: Pearson
Publication date: 01/01/2022
Field of study

Institutional Research Information System University of Turin

Quality issues when using Big Data in Official Statistics

Author: Giulio Barcaroli
Golini Natalia
Paolo Righi
Publication venue: Università degli Studi di Firenze
Publication date: 01/01/2017
Field of study

Institutional Research Information System University of Turin

Quality evaluation of experimental statistics produced by making use of Big Data

Author: Giulio Barcaroli
Golini Natalia
Paolo Righi
Publication venue: 'Indiana University Press (Project Muse)'
Publication date: 01/01/2018
Field of study

Institutional Research Information System University of Turin

Functional zoning of biodiversity profiles

Author: Golini Natalia
Ignaccolo Rosaria
Ippoliti Luigi
Pronello Nicola
Publication venue
Publication date: 26/09/2023
Field of study

Spatial mapping of biodiversity is crucial to investigate spatial variations in natural communities. Several indices have been proposed in the literature to represent biodiversity as a single statistic. However, these indices only provide information on individual dimensions of biodiversity, thus failing to grasp its complexity comprehensively. Consequently, relying solely on these single indices can lead to misleading conclusions about the actual state of biodiversity. In this work, we focus on biodiversity profiles, which provide a more flexible framework to express biodiversity through non-negative and convex curves, which can be analyzed by means of functional data analysis. By treating the whole curves as single entities, we propose to achieve a functional zoning of the region of interest by means of a penalized model-based clustering procedure. This provides a spatial clustering of the biodiversity profiles, which is useful for policy-makers both for conserving and managing natural resources and revealing patterns of interest. Our approach is discussed through the analysis of Harvard Forest Data, which provides information on the spatial distribution of woody stems within a plot of the Harvard Forest

arXiv.org e-Print Archive

Agrimonia: a dataset on livestock, meteorology and air quality in the Lombardy region, Italy

Author: Cameletti Michela
Fassò Alessandro
Finazzi Francesco
Golini Natalia
Ignaccolo Rosaria
Maranzano Paolo
Moro Alessandro Fusta
Otto Philipp
Rodeschini Jacopo
Shaboviq Qendrim
Publication venue: London : Nature Publ. Group
Publication date: 01/01/2023
Field of study

The air in the Lombardy region, Italy, is one of the most polluted in Europe because of limited air circulation and high emission levels. There is a large scientific consensus that the agricultural sector has a significant impact on air quality. To support studies quantifying the role of the agricultural and livestock sectors on the Lombardy air quality, this paper presents a harmonised dataset containing daily values of air quality, weather, emissions, livestock, and land and soil use in the years 2016–2021, for the Lombardy region. The daily scale is obtained by averaging hourly data and interpolating other variables. In fact, the pollutant data come from the European Environmental Agency and the Lombardy Regional Environment Protection Agency, weather and emissions data from the European Copernicus programme, livestock data from the Italian zootechnical registry, and land and soil use data from the CORINE Land Cover project. The resulting dataset is designed to be used as is by those using air quality data for research

Institutionelles Repositorium der Leibniz Universität Hannover

Spatiotemporal modelling of PM $_{2.5}$ concentrations in Lombardy (Italy) -- A comparative study

Author: Cameletti Michela
Fassò Alessandro
Finazzi Francesco
Golini Natalia
Ignaccolo Rosaria
Maranzano Paolo
Moro Alessandro Fusta
Otto Philipp
Rodeschini Jacopo
Shaboviq Qendrim
Publication venue
Publication date: 13/09/2023
Field of study

This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models (GAMM), and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting PM

_{2.5}

concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and PM

_{2.5}

concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches

arXiv.org e-Print Archive