254 research outputs found

    Hierarchical spatial models for predicting tree species assemblages across large domains

    Full text link
    Spatially explicit data layers of tree species assemblages, referred to as forest types or forest type groups, are a key component in large-scale assessments of forest sustainability, biodiversity, timber biomass, carbon sinks and forest health monitoring. This paper explores the utility of coupling georeferenced national forest inventory (NFI) data with readily available and spatially complete environmental predictor variables through spatially-varying multinomial logistic regression models to predict forest type groups across large forested landscapes. These models exploit underlying spatial associations within the NFI plot array and the spatially-varying impact of predictor variables to improve the accuracy of forest type group predictions. The richness of these models incurs onerous computational burdens and we discuss dimension reducing spatial processes that retain the richness in modeling. We illustrate using NFI data from Michigan, USA, where we provide a comprehensive analysis of this large study area and demonstrate improved prediction with associated measures of uncertainty.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS250 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Predictive modeling techniques with application to the Cerulean Warbler (Dendroica cerulea) in the Appalachian Mountains Bird Conservation Region

    Get PDF
    Many statistical approaches have been used for developing predictive models for wildlife presence/absence and abundance, each with varying levels of accuracy and complexity. As concerns for declining species intensify and anthropogenic impacts on habitats increase, the ability to quickly quantify and map species distributions and abundances over large regions will become increasingly important. To date, there is no set of best practices for modeling specific wildlife groups. My primary objectives with this thesis were to (1) compare model techniques for ease of use and accuracy, and (2) compare resolution of species occurrence data and its effect on model accuracy.;For the first objective, I compared two modeling techniques that range from moderately quick and simplistic (decision trees) to conceptually and computationally complex (hierarchical spatial models). I used North American Breeding Bird Survey counts with a suite of explanatory variables to predict presence and abundance of cerulean warblers (Dendroica cerulea) in the Appalachian Mountains Bird Conservation Region. Of the decision tree methods, cerulean warbler occurrence was most accurately described by presence/absence models. Regression tree abundance models under-predicted counts and had low accuracy. Hierarchical spatial models predicted abundance of cerulean warblers similar to actual counts, and with better overall accuracy than regression trees. All techniques produced models using similar variables; interior forest and percent forest were most important for identifying areas with cerulean warblers.;For the second objective, I compared two model types, differing in the resolution of the species distribution data. I used North American Breeding Bird Survey (NABBS) counts with a suite of explanatory variables to predict presence and abundance of cerulean warblers (Dendroica cerulea) in the Appalachian Mountains Bird Conservation Region (BCR28). Decision trees were created for route-level and stop-level analyses of presence and abundance. Additionally, output maps have typically been resolved to the resolution of the environmental spatial datasets with little attention given to the scale at which the predictions represent. Using the modeling results, predictive distribution maps were created for cerulean warblers with appropriate resolutions for each model group. Route-level decision trees performed better than stop-level models for predicting both presence and abundance of cerulean warblers. Similar to raw NABBS distribution data, cerulean warblers were predicted to occur in highest concentrations in the central portions of the BCR. Poor performance of stop-level models may result from a mismatch of resolution of environmental data to species survey data, or lack of important environmental covariates at the stop-level scale. The results of this study highlight the importance of correctly matching the resolution of the species distribution data to the resolution of environmental covariates and the extent of analysis.;The results and relationships highlighted in this thesis may serve to direct management and monitoring for the cerulean warbler, and other migratory passerines

    A pairwise likelihood approach for the empirical estimation of the underlyingvariograms in the plurigaussian models

    Full text link
    The plurigaussian model is particularly suited to describe categorical regionalized variables. Starting from a simple principle, the thresh-olding of one or several Gaussian random fields (GRFs) to obtain categories, the plurigaussian model is well adapted for a wide range ofsituations. By acting on the form of the thresholding rule and/or the threshold values (which can vary along space) and the variograms ofthe underlying GRFs, one can generate many spatial configurations for the categorical variables. One difficulty is to choose variogrammodel for the underlying GRFs. Indeed, these latter are hidden by the truncation and we only observe the simple and cross-variogramsof the category indicators. In this paper, we propose a semiparametric method based on the pairwise likelihood to estimate the empiricalvariogram of the GRFs. It provides an exploratory tool in order to choose a suitable model for each GRF and later to estimate its param-eters. We illustrate the efficiency of the method with a Monte-Carlo simulation study .The method presented in this paper is implemented in the R packageRGeostats.Comment: To be submitted to Spatial Statistic

    A Bayesian marked spatial point processes model for basketball shot chart

    Full text link
    The success rate of a basketball shot may be higher at locations where a player makes more shots. For a marked spatial point process, this means that the mark and the intensity are associated. We propose a Bayesian joint model for the mark and the intensity of marked point processes, where the intensity is incorporated in the mark model as a covariate. Inferences are done with a Markov chain Monte Carlo algorithm. Two Bayesian model comparison criteria, the Deviance Information Criterion and the Logarithm of the Pseudo-Marginal Likelihood, were used to assess the model. The performances of the proposed methods were examined in extensive simulation studies. The proposed methods were applied to the shot charts of four players (Curry, Harden, Durant, and James) in the 2017--2018 regular season of the National Basketball Association to analyze their shot intensity in the field and the field goal percentage in detail. Application to the top 50 most frequent shooters in the season suggests that the field goal percentage and the shot intensity are positively associated for a majority of the players. The fitted parameters were used as inputs in a secondary analysis to cluster the players into different groups

    ActiveRemediation: The Search for Lead Pipes in Flint, Michigan

    Full text link
    We detail our ongoing work in Flint, Michigan to detect pipes made of lead and other hazardous metals. After elevated levels of lead were detected in residents' drinking water, followed by an increase in blood lead levels in area children, the state and federal governments directed over $125 million to replace water service lines, the pipes connecting each home to the water system. In the absence of accurate records, and with the high cost of determining buried pipe materials, we put forth a number of predictive and procedural tools to aid in the search and removal of lead infrastructure. Alongside these statistical and machine learning approaches, we describe our interactions with government officials in recommending homes for both inspection and replacement, with a focus on the statistical model that adapts to incoming information. Finally, in light of discussions about increased spending on infrastructure development by the federal government, we explore how our approach generalizes beyond Flint to other municipalities nationwide.Comment: 10 pages, 10 figures, To appear in KDD 2018, For associated promotional video, see https://www.youtube.com/watch?v=YbIn_axYu9

    A spliced Gamma-Generalized Pareto model for short-term extreme wind speed probabilistic forecasting

    Get PDF
    Renewable sources of energy such as wind power have become a sustainable alternative to fossil fuel-based energy. However, the uncertainty and fluctuation of the wind speed derived from its intermittent nature bring a great threat to the wind power production stability, and to the wind turbines themselves. Lately, much work has been done on developing models to forecast average wind speed values, yet surprisingly little has focused on proposing models to accurately forecast extreme wind speeds, which can damage the turbines. In this work, we develop a flexible spliced Gamma-Generalized Pareto model to forecast extreme and non-extreme wind speeds simultaneously. Our model belongs to the class of latent Gaussian models, for which inference is conveniently performed based on the integrated nested Laplace approximation method. Considering a flexible additive regression structure, we propose two models for the latent linear predictor to capture the spatio-temporal dynamics of wind speeds. Our models are fast to fit and can describe both the bulk and the tail of the wind speed distribution while producing short-term extreme and non-extreme wind speed probabilistic forecasts.Comment: 25 page
    • …
    corecore