254 research outputs found
Hierarchical spatial models for predicting tree species assemblages across large domains
Spatially explicit data layers of tree species assemblages, referred to as
forest types or forest type groups, are a key component in large-scale
assessments of forest sustainability, biodiversity, timber biomass, carbon
sinks and forest health monitoring. This paper explores the utility of coupling
georeferenced national forest inventory (NFI) data with readily available and
spatially complete environmental predictor variables through spatially-varying
multinomial logistic regression models to predict forest type groups across
large forested landscapes. These models exploit underlying spatial associations
within the NFI plot array and the spatially-varying impact of predictor
variables to improve the accuracy of forest type group predictions. The
richness of these models incurs onerous computational burdens and we discuss
dimension reducing spatial processes that retain the richness in modeling. We
illustrate using NFI data from Michigan, USA, where we provide a comprehensive
analysis of this large study area and demonstrate improved prediction with
associated measures of uncertainty.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS250 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Predictive modeling techniques with application to the Cerulean Warbler (Dendroica cerulea) in the Appalachian Mountains Bird Conservation Region
Many statistical approaches have been used for developing predictive models for wildlife presence/absence and abundance, each with varying levels of accuracy and complexity. As concerns for declining species intensify and anthropogenic impacts on habitats increase, the ability to quickly quantify and map species distributions and abundances over large regions will become increasingly important. To date, there is no set of best practices for modeling specific wildlife groups. My primary objectives with this thesis were to (1) compare model techniques for ease of use and accuracy, and (2) compare resolution of species occurrence data and its effect on model accuracy.;For the first objective, I compared two modeling techniques that range from moderately quick and simplistic (decision trees) to conceptually and computationally complex (hierarchical spatial models). I used North American Breeding Bird Survey counts with a suite of explanatory variables to predict presence and abundance of cerulean warblers (Dendroica cerulea) in the Appalachian Mountains Bird Conservation Region. Of the decision tree methods, cerulean warbler occurrence was most accurately described by presence/absence models. Regression tree abundance models under-predicted counts and had low accuracy. Hierarchical spatial models predicted abundance of cerulean warblers similar to actual counts, and with better overall accuracy than regression trees. All techniques produced models using similar variables; interior forest and percent forest were most important for identifying areas with cerulean warblers.;For the second objective, I compared two model types, differing in the resolution of the species distribution data. I used North American Breeding Bird Survey (NABBS) counts with a suite of explanatory variables to predict presence and abundance of cerulean warblers (Dendroica cerulea) in the Appalachian Mountains Bird Conservation Region (BCR28). Decision trees were created for route-level and stop-level analyses of presence and abundance. Additionally, output maps have typically been resolved to the resolution of the environmental spatial datasets with little attention given to the scale at which the predictions represent. Using the modeling results, predictive distribution maps were created for cerulean warblers with appropriate resolutions for each model group. Route-level decision trees performed better than stop-level models for predicting both presence and abundance of cerulean warblers. Similar to raw NABBS distribution data, cerulean warblers were predicted to occur in highest concentrations in the central portions of the BCR. Poor performance of stop-level models may result from a mismatch of resolution of environmental data to species survey data, or lack of important environmental covariates at the stop-level scale. The results of this study highlight the importance of correctly matching the resolution of the species distribution data to the resolution of environmental covariates and the extent of analysis.;The results and relationships highlighted in this thesis may serve to direct management and monitoring for the cerulean warbler, and other migratory passerines
A pairwise likelihood approach for the empirical estimation of the underlyingvariograms in the plurigaussian models
The plurigaussian model is particularly suited to describe categorical
regionalized variables. Starting from a simple principle, the thresh-olding of
one or several Gaussian random fields (GRFs) to obtain categories, the
plurigaussian model is well adapted for a wide range ofsituations. By acting on
the form of the thresholding rule and/or the threshold values (which can vary
along space) and the variograms ofthe underlying GRFs, one can generate many
spatial configurations for the categorical variables. One difficulty is to
choose variogrammodel for the underlying GRFs. Indeed, these latter are hidden
by the truncation and we only observe the simple and cross-variogramsof the
category indicators. In this paper, we propose a semiparametric method based on
the pairwise likelihood to estimate the empiricalvariogram of the GRFs. It
provides an exploratory tool in order to choose a suitable model for each GRF
and later to estimate its param-eters. We illustrate the efficiency of the
method with a Monte-Carlo simulation study .The method presented in this paper
is implemented in the R packageRGeostats.Comment: To be submitted to Spatial Statistic
A Bayesian marked spatial point processes model for basketball shot chart
The success rate of a basketball shot may be higher at locations where a
player makes more shots. For a marked spatial point process, this means that
the mark and the intensity are associated. We propose a Bayesian joint model
for the mark and the intensity of marked point processes, where the intensity
is incorporated in the mark model as a covariate. Inferences are done with a
Markov chain Monte Carlo algorithm. Two Bayesian model comparison criteria, the
Deviance Information Criterion and the Logarithm of the Pseudo-Marginal
Likelihood, were used to assess the model. The performances of the proposed
methods were examined in extensive simulation studies. The proposed methods
were applied to the shot charts of four players (Curry, Harden, Durant, and
James) in the 2017--2018 regular season of the National Basketball Association
to analyze their shot intensity in the field and the field goal percentage in
detail. Application to the top 50 most frequent shooters in the season suggests
that the field goal percentage and the shot intensity are positively associated
for a majority of the players. The fitted parameters were used as inputs in a
secondary analysis to cluster the players into different groups
ActiveRemediation: The Search for Lead Pipes in Flint, Michigan
We detail our ongoing work in Flint, Michigan to detect pipes made of lead
and other hazardous metals. After elevated levels of lead were detected in
residents' drinking water, followed by an increase in blood lead levels in area
children, the state and federal governments directed over $125 million to
replace water service lines, the pipes connecting each home to the water
system. In the absence of accurate records, and with the high cost of
determining buried pipe materials, we put forth a number of predictive and
procedural tools to aid in the search and removal of lead infrastructure.
Alongside these statistical and machine learning approaches, we describe our
interactions with government officials in recommending homes for both
inspection and replacement, with a focus on the statistical model that adapts
to incoming information. Finally, in light of discussions about increased
spending on infrastructure development by the federal government, we explore
how our approach generalizes beyond Flint to other municipalities nationwide.Comment: 10 pages, 10 figures, To appear in KDD 2018, For associated
promotional video, see https://www.youtube.com/watch?v=YbIn_axYu9
A spliced Gamma-Generalized Pareto model for short-term extreme wind speed probabilistic forecasting
Renewable sources of energy such as wind power have become a sustainable
alternative to fossil fuel-based energy. However, the uncertainty and
fluctuation of the wind speed derived from its intermittent nature bring a
great threat to the wind power production stability, and to the wind turbines
themselves. Lately, much work has been done on developing models to forecast
average wind speed values, yet surprisingly little has focused on proposing
models to accurately forecast extreme wind speeds, which can damage the
turbines. In this work, we develop a flexible spliced Gamma-Generalized Pareto
model to forecast extreme and non-extreme wind speeds simultaneously. Our model
belongs to the class of latent Gaussian models, for which inference is
conveniently performed based on the integrated nested Laplace approximation
method. Considering a flexible additive regression structure, we propose two
models for the latent linear predictor to capture the spatio-temporal dynamics
of wind speeds. Our models are fast to fit and can describe both the bulk and
the tail of the wind speed distribution while producing short-term extreme and
non-extreme wind speed probabilistic forecasts.Comment: 25 page
- …