7,218 research outputs found
Advocating better habitat use and selection models in bird ecology
Studies on habitat use and habitat selection represent a basic aspect of bird ecology, due to its importance in natural history, distribution, response to environmental changes, management and conservation. Basically, a statistical model that identifies environmental variables linked to a species presence is searched for. In this sense, there is a wide array of analytical methods that identify important explanatory variables within a model, with higher explanatory and predictive power than classical regression approaches. However, some of these powerful models are not widespread in ornithological studies, partly because of their complex theory, and in some cases, difficulties on their implementation and interpretation. Here, I describe generalized linear models and other five statistical models for the analysis of bird habitat use and selection outperforming classical approaches: generalized additive models, mixed effects models, occupancy models, binomial N-mixture models and decision trees (classification and regression trees, bagging, random forests and boosting). Each of these models has its benefits and drawbacks, but major advantages include dealing with non-normal distributions (presence-absence and abundance data typically found in habitat use and selection studies), heterogeneous variances, non-linear and complex relationships among variables, lack of statistical independence and imperfect detection. To aid ornithologists in making use of the methods described, a readable description of each method is provided, as well as a flowchart along with some recommendations to help them decide the most appropriate analysis. The use of these models in ornithological studies is encouraged, given their huge potential as statistical tools in bird ecology.Fil: Palacio, Facundo Xavier. Consejo Nacional de Investigaciones CientĂficas y TĂŠcnicas; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo. DivisiĂłn ZoologĂa de Vertebrados. SecciĂłn OrnitologĂa; Argentin
Detecting multivariate interactions in spatial point patterns with Gibbs models and variable selection
We propose a method for detecting significant interactions in very large
multivariate spatial point patterns. This methodology develops high dimensional
data understanding in the point process setting. The method is based on
modelling the patterns using a flexible Gibbs point process model to directly
characterise point-to-point interactions at different spatial scales. By using
the Gibbs framework significant interactions can also be captured at small
scales. Subsequently, the Gibbs point process is fitted using a
pseudo-likelihood approximation, and we select significant interactions
automatically using the group lasso penalty with this likelihood approximation.
Thus we estimate the multivariate interactions stably even in this setting. We
demonstrate the feasibility of the method with a simulation study and show its
power by applying it to a large and complex rainforest plant population data
set of 83 species
Distributed multinomial regression
This article introduces a model-based approach to distributed computing for
multinomial logistic (softmax) regression. We treat counts for each response
category as independent Poisson regressions via plug-in estimates for fixed
effects shared across categories. The work is driven by the
high-dimensional-response multinomial models that are used in analysis of a
large number of random counts. Our motivating applications are in text
analysis, where documents are tokenized and the token counts are modeled as
arising from a multinomial dependent upon document attributes. We estimate such
models for a publicly available data set of reviews from Yelp, with text
regressed onto a large set of explanatory variables (user, business, and rating
information). The fitted models serve as a basis for exploring the connection
between words and variables of interest, for reducing dimension into supervised
factor scores, and for prediction. We argue that the approach herein provides
an attractive option for social scientists and other text analysts who wish to
bring familiar regression tools to bear on text data.Comment: Published at http://dx.doi.org/10.1214/15-AOAS831 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Boosting insights in insurance tariff plans with tree-based machine learning methods
Pricing actuaries typically operate within the framework of generalized
linear models (GLMs). With the upswing of data analytics, our study puts focus
on machine learning methods to develop full tariff plans built from both the
frequency and severity of claims. We adapt the loss functions used in the
algorithms such that the specific characteristics of insurance data are
carefully incorporated: highly unbalanced count data with excess zeros and
varying exposure on the frequency side combined with scarce, but potentially
long-tailed data on the severity side. A key requirement is the need for
transparent and interpretable pricing models which are easily explainable to
all stakeholders. We therefore focus on machine learning with decision trees:
starting from simple regression trees, we work towards more advanced ensembles
such as random forests and boosted trees. We show how to choose the optimal
tuning parameters for these models in an elaborate cross-validation scheme, we
present visualization tools to obtain insights from the resulting models and
the economic value of these new modeling approaches is evaluated. Boosted trees
outperform the classical GLMs, allowing the insurer to form profitable
portfolios and to guard against potential adverse risk selection
Private Trees as Household Assets and Determinants of Tree-Growing Behavior in Rural Ethiopia
This study looked into tree-growing behavior of rural households in Ethiopia. With data collected at household and parcel levels from the four major regions of Ethiopia, we analyzed the decision to grow trees and the number of trees grown, using such econometric strategies as a zero-inflated negative binomial model, Heckmanâs two-step procedure, and panel data techniques. Our findings show the importance of analysis at the parcel level in addition to the more common household-level. Moreover, the empirical analysis indicates that the determinants of the decision to grow trees are not necessarily the same as those involved in deciding the number of trees grown. Land certification, as an indicator of tenure security, increases the likelihood that households will grow trees, but is not a significant determinant of the number of trees grown. Other variables, such as risk aversion, land size, adult male labor, and education of household head, also influence the number of trees grown. In general, the results suggest the need to use education and/or awareness of the role and importance of trees and point out the importance of household endowments and behavior, such as land, labor, and risk aversion, for tree growing. Finally, we observed that, while tree planting is practiced in all four regions covered, there are variations across regions.trees as assets, tree growing, Ethiopia
Variable Selection for Nonparametric Gaussian Process Priors: Models and Computational Strategies
This paper presents a unified treatment of Gaussian process models that
extends to data from the exponential dispersion family and to survival data.
Our specific interest is in the analysis of data sets with predictors that have
an a priori unknown form of possibly nonlinear associations to the response.
The modeling approach we describe incorporates Gaussian processes in a
generalized linear model framework to obtain a class of nonparametric
regression models where the covariance matrix depends on the predictors. We
consider, in particular, continuous, categorical and count responses. We also
look into models that account for survival outcomes. We explore alternative
covariance formulations for the Gaussian process prior and demonstrate the
flexibility of the construction. Next, we focus on the important problem of
selecting variables from the set of possible predictors and describe a general
framework that employs mixture priors. We compare alternative MCMC strategies
for posterior inference and achieve a computationally efficient and practical
approach. We demonstrate performances on simulated and benchmark data sets.Comment: Published in at http://dx.doi.org/10.1214/11-STS354 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions
Boosting is one of the most important methods for fitting
regression models and building prediction rules from
high-dimensional data. A notable feature of boosting is that the
technique has a built-in mechanism for shrinking coefficient
estimates and variable selection. This regularization mechanism
makes boosting a suitable method for analyzing data characterized by
small sample sizes and large numbers of predictors. We extend the
existing methodology by developing a boosting method for prediction
functions with multiple components. Such multidimensional functions
occur in many types of statistical models, for example in count data
models and in models involving outcome variables with a mixture
distribution. As will be demonstrated, the new algorithm is suitable
for both the estimation of the prediction function and
regularization of the estimates. In addition, nuisance parameters
can be estimated simultaneously with the prediction function
VALUING IDAHO WINERIES WITH A TRAVEL COST MODEL
Many commercial wineries produce a dual product; commercial wine and wine tourism. Growth of wine tourism throughout the US has been phenomenal. In contrast to the price of wine, which is reflected in the market, the demand for wine tourism can be only ascertained with a shadow price for winery visitation. The demand for wine tourism visits for Canyon County in southern Idaho was estimated using the Travel Cost Method. The value of wine tourism in Canyon County was estimated to be $5.40 per person per trip and trip demand was highly inelastic at 0.5. Elasticities of other trip demand function variables were estimated and analyzed, with a view to informing the marketing of Idaho's emerging wine tourism industry.Community/Rural/Urban Development, Crop Production/Industries,
Risk factor analysis and spatiotemporal CART model of cryptosporidiosis in Queensland, Australia
Background: It remains unclear whether it is possible to develop a spatiotemporal epidemic prediction model for cryptosporidiosis disease. This paper examined the impact of social economic and weather factors on cryptosporidiosis and explored the possibility of developing such a model using social economic and weather data in Queensland, Australia.Methods: Data on weather variables, notified cryptosporidiosis cases and social economic factors in Queensland were supplied by the Australian Bureau of Meteorology, Queensland Department of Health, and Australian Bureau of Statistics, respectively. Three-stage spatiotemporal classification and regression tree (CART) models were developed to examine the association between social economic and weather factors and monthly incidence of cryptosporidiosis in Queensland, Australia. The spatiotemporal CART model was used for predicting the outbreak of cryptosporidiosis in Queensland, Australia.Results: The results of the classification tree model (with incidence rates defined as binary presence/absence) showed that there was an 87% chance of an occurrence of cryptosporidiosis in a local government area (LGA) if the socio-economic index for the area (SEIFA) exceeded 1021, while the results of regression tree model (based on non-zero incidence rates) show when SEIFA was between 892 and 945, and temperature exceeded 32°C, the relative risk (RR) of cryptosporidiosis was 3.9 (mean morbidity: 390.6/100,000, standard deviation (SD): 310.5), compared to monthly average incidence of cryptosporidiosis. When SEIFA was less than 892 the RR of cryptosporidiosis was 4.3 (mean morbidity: 426.8/100,000, SD: 319.2). A prediction map for the cryptosporidiosis outbreak was made according to the outputs of spatiotemporal CART models.Conclusions: The results of this study suggest that spatiotemporal CART models based on social economic and weather variables can be used for predicting the outbreak of cryptosporidiosis in Queensland, Australia
- âŚ