19 research outputs found

    First CLADAG data mining prize : data mining for longitudinal data with different marketing campaigns

    Get PDF
    The CLAssification and Data Analysis Group (CLADAG) of the Italian Statistical Society recently organised a competition, the 'Young Researcher Data Mining Prize' sponsored by the SAS Institute. This paper was the winning entry and in it we detail our approach to the problem proposed and our results. The main methods used are linear regression, mixture models, Bayesian autoregressive and Bayesian dynamic models

    Dynamic multiscale spatiotemporal models for Poisson data

    Get PDF
    We propose a new class of dynamic multiscale models for Poisson spatiotemporal processes. Specifically, we use a multiscale spatial Poisson factorization to decompose the Poisson process at each time point into spatiotemporal multiscale coefficients. We then connect these spatiotemporal multiscale coefficients through time with a novel Dirichlet evolution. Further, we propose a simulation-based full Bayesian posterior analysis. In particular, we develop filtering equations for updating of information forward in time and smoothing equations for integration of information backward in time, and use these equations to develop a forward filter backward sampler for the spatiotemporal multiscale coefficients. Because the multiscale coefficients are conditionally independent a posteriori, our full Bayesian posterior analysis is scalable, computationally efficient, and highly parallelizable. Moreover, the Dirichlet evolution of each spatiotemporal multiscale coefficient is parametrized by a discount factor that encodes the relevance of the temporal evolution of the spatiotemporal multiscale coefficient. Therefore, the analysis of discount factors provides a powerful way to identify regions with distinctive spatiotemporal dynamics. Finally, we illustrate the usefulness of our multiscale spatiotemporal Poisson methodology with two applications. The first application examines mortality ratios in the state of Missouri, and the second application considers tornado reports in the American Midwest

    Space-time calibration of wind speed forecasts from regional climate models

    Full text link
    Numerical weather predictions (NWP) are systematically subject to errors due to the deterministic solutions used by numerical models to simulate the atmosphere. Statistical postprocessing techniques are widely used nowadays for NWP calibration. However, time-varying bias is usually not accommodated by such models. Its calibration performance is also sensitive to the temporal window used for training. This paper proposes space-time models that extend the main statistical postprocessing approaches to calibrate NWP model outputs. Trans-Gaussian random fields are considered to account for meteorological variables with asymmetric behavior. Data augmentation is used to account for censuring in the response variable. The benefits of the proposed extensions are illustrated through the calibration of hourly 10 m wind speed forecasts in Southeastern Brazil coming from the Eta model.Comment: 43 pages, 13 figure

    A decision support system for addressing food security in the United Kingdom

    Get PDF
    This paper presents an integrating decision support system (IDSS) for food security in the United Kingdom. In ever‐larger dynamic systems, such as the food system, it is increasingly difficult for decision makers (DMs) to effectively account for all the variables within the system that may influence the outcomes of interest under enactments of various candidate policies. Each of the influencing variables is likely, themselves, to be dynamic subsystems with expert domains supported by sophisticated probabilistic models. Recent increases in food poverty in the United Kingdom have raised the questions about the main drivers of food insecurity, how this may be changing over time and how evidence can be used in evaluating policy for decision support. In this context, an IDSS is proposed for household food security to allow DMs to compare several candidate policies which may affect the outcome of food insecurity at the household level

    Bayesian cross-validation of geostatistical models

    Get PDF
    The problem of validating or criticizing models for georeferenced data is challenging as much as conclusions may be sensitive to the partition of data into training and validation cases. This is an obvious issue related to the basic validation scheme which selects a subset of the data to leave out of estimation and to make predictions with an assumed model. In this setup, only a few out-of-sample locations are usually selected to validate the model. On the other hand, the cross-validation approach, which considers several possible configurations of data divided into training and validation observations, is an appealing alternative, but it could be computationally demanding as the estimation of parameters usually requires computationally intensive methods. The purpose of this work is to use cross-validation techniques to choose between competing models and to assess the goodness of fit of spatial models in different regions of the spatial domain. We consider the sampling design for selecting the training and validation sets by assigning a probability distribution to the possible data partitions. To deal with the computational burden of cross-validation, we estimate discrepancy functions in a computationally efficient manner based on the importance weighting of posterior samples. Furthermore, we propose a stratified cross-validation scheme to take into account spatial heterogeneity, reducing the total variance of estimated predictive discrepancy measures. We also illustrate the advantages of our proposal with simulated examples of homogeneous and inhomogeneous spatial processes and with an application to rainfall dataset in Rio de Janeiro. The purpose of this work is to use cross-validation techniques to choose between competing models and to assess the goodness of fit of spatial models in different regions of the spatial domain. We consider the sampling design for selecting the training and validation sets by assigning a probability distribution to the possible data partitions. To deal with the computational burden of cross-validation, we estimate discrepancy functions in a computationally efficient manner based on the importance weighting of posterior samples. Furthermore, we propose a stratified cross-validation scheme to take into account spatial heterogeneity, reducing the total variance of estimated predictive discrepancy measures. We also illustrate the advantages of our proposal with simulated examples of homogeneous and inhomogeneous spatial processes and with an application to rainfall dataset in Rio de Janeiro

    BayesMortalityPlus: A package in R for Bayesian graduation of mortality modelling

    Full text link
    The BayesMortalityPlus package provides a framework for modelling and predicting mortality data. The package includes tools for the construction of life tables based on Heligman-Pollard laws, and also on dynamic linear smoothers. Flexibility is available in terms of modelling so that the response variable may be modeled as Poisson, Binomial or Gaussian. If temporal data is available, the package provides a Bayesian implementation for the well-known Lee-Carter model that allows for estimation, projection of mortality over time, and assessment of uncertainty of any linear or nonlinear function of parameters such as life expectancy. Illustrations are considered to show the capability of the proposed package to model mortality data

    Prevalence of non-communicable diseases in Brazilian children: follow-up at school age of two Brazilian birth cohorts of the 1990's

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Few cohort studies have been conducted in low and middle-income countries to investigate non-communicable diseases among school-aged children. This article aims to describe the methodology of two birth cohorts, started in 1994 in Ribeirão Preto (RP), a more developed city, and in 1997/98 in São Luís (SL), a less developed town.</p> <p>Methods</p> <p>Prevalences of some non-communicable diseases during the first follow-up of these cohorts were estimated and compared. Data on singleton live births were obtained at birth (2858 in RP and 2443 in SL). The follow-up at school age was conducted in RP in 2004/05, when the children were 9-11 years old and in SL in 2005/06, when the children were 7-9 years old. Follow-up rates were 68.7% in RP (790 included) and 72.7% in SL (673 participants). The groups of low (<2500 g) and high (≥ 4250 g) birthweight were oversampled and estimates were corrected by weighting.</p> <p>Results</p> <p>In the more developed city there was a higher percentage of non-nutritive sucking habits (69.1% vs 47.9%), lifetime bottle use (89.6% vs 68.3%), higher prevalence of primary headache in the last 15 days (27.9% vs 13.0%), higher positive skin tests for allergens (44.3% vs 25.3%) and higher prevalence of overweight (18.2% vs 3.6%), obesity (9.5% vs 1.8%) and hypertension (10.9% vs 4.6%). In the less developed city there was a larger percentage of children with below average cognitive function (28.9% vs 12.2%), mental health problems (47.4% vs 38.4%), depression (21.6% vs 6.0%) and underweight (5.8% vs 3.6%). There was no difference in the prevalence of bruxism, recurrent abdominal pain, asthma and bronchial hyperresponsiveness between cities.</p> <p>Conclusions</p> <p>Some non-communicable diseases were highly prevalent, especially in the more developed city. Some high rates suggest that the burden of non-communicable diseases will be high in the future, especially mental health problems.</p
    corecore