105 research outputs found
Poisson point process models solve the "pseudo-absence problem" for presence-only data in ecology
Presence-only data, point locations where a species has been recorded as
being present, are often used in modeling the distribution of a species as a
function of a set of explanatory variables---whether to map species occurrence,
to understand its association with the environment, or to predict its response
to environmental change. Currently, ecologists most commonly analyze
presence-only data by adding randomly chosen "pseudo-absences" to the data such
that it can be analyzed using logistic regression, an approach which has
weaknesses in model specification, in interpretation, and in implementation. To
address these issues, we propose Poisson point process modeling of the
intensity of presences. We also derive a link between the proposed approach and
logistic regression---specifically, we show that as the number of
pseudo-absences increases (in a regular or uniform random arrangement),
logistic regression slope parameters and their standard errors converge to
those of the corresponding Poisson point process model. We discuss the
practical implications of these results. In particular, point process modeling
offers a framework for choice of the number and location of pseudo-absences,
both of which are currently chosen by ad hoc and sometimes ineffective methods
in ecology, a point which we illustrate by example.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS331 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A general algorithm for covariance modeling of discrete data
We propose an algorithm that generalizes to discrete data any given covariance modeling algorithm originally intended for Gaussian responses, via a Gaussian copula approach. Covariance modeling is a powerful tool for extracting meaning from multivariate data, and fast algorithms for Gaussian data, such as factor analysis and Gaussian graphical models, are widely available. Our algorithm makes these tools generally available to analysts of discrete data and can combine any likelihood-based covariance modeling method for Gaussian data with any set of discrete marginal distributions. Previously, tools for discrete data were generally specific to one family of distributions or covariance modeling paradigm, or otherwise did not exist. Our algorithm is more flexible than alternate methods, takes advantage of existing fast algorithms for Gaussian data, and simulations suggest that it outperforms competing graphical modeling and factor analysis procedures for count and binomial data. We additionally show that in a Gaussian copula graphical model with discrete margins, conditional independence relationships in the latent Gaussian variables are inherited by the discrete observations. Our method is illustrated with a graphical model and factor analysis on an overdispersed ecological count dataset of species abundances
Thirty years of change in a benthic macroinvertebrate community of southwestern Lake Ontario after invasion by four Ponto-Caspian species
Beginning in the mid-1980s, the Laurentian Great Lakes underwent successive invasions by PontoCaspian species. We quantified major changes in the diversity and relative abundance of pre-invasion benthic macroinvertebrates at the same study site in southwestern Lake Ontario from 1983â2014. The zebra mussel Dreissena polymorpha Pallas arrived at the study site before 1991, the quagga mussel Dreissena rostriformis bugensis Andrusov and the amphipod Echinogammarus ischnus Stebbing arrived before 1999, and the Round Goby Neogobius melanostomus Pallas arrived about 2004. The macroinvertebrate community in 2014 was very different from 3 earlier communities in 1983, 1991, and 1999. In 2014, pulmonate and prosobranch snails and sphaeriid bivalves were absent, D. r. bugensis replaced D. polymorpha, E. ischnus replaced Gammarus fasciatus Say as the dominant amphipod, and a previously diverse community of benthic fish was replaced by abundant N. melanostomus. From 1983 to 1999, the relative abundance of prosobranchs and pulmonates declined 10-fold and rose 2-fold, respectively. From 1991 to 2014, the relative abundance of oligochaetes and chironomids increased 32- and 78-fold, respectively. The shifts we report probably are attributable to nutrient enrichment of the nearshore of Lake Ontario during the 1990s leading to a thick carpet of macroalgae, a change in the base of the benthic food web from dressenid feces and pseudofeces to macroalgal detritus, and predation by N. melanostomus on snails
Order selection and sparsity in latent variable models via the ordered factor LASSO
Generalized linear latent variable models (GLLVMs) offer a general framework for flexibly analyzing data involving multiple responses. When fitting such models, two of the major challenges are selecting the order, that is, the number of factors, and an appropriate structure for the loading matrix, typically a sparse structure. Motivated by the application of GLLVMs to study marine species assemblages in the Southern Ocean, we propose the Ordered Factor LASSO or OFAL penalty for order selection and achieving sparsity in GLLVMs. The OFAL penalty is the first penalty developed specifically for order selection in latent variable models, and achieves this by using a hierarchically structured group LASSO type penalty to shrink entire columns of the loading matrix to zero, while ensuring that nonâzero loadings are concentrated on the lowerâorder factors. Simultaneously, individual element sparsity is achieved through the use of an adaptive LASSO. In conjunction with using an information criterion which promotes aggressive shrinkage, simulation shows that the OFAL penalty performs strongly compared with standard methods and penalties for order selection, achieving sparsity, and prediction in GLLVMs. Applying the OFAL penalty to the Southern Ocean marine species dataset suggests the available environmental predictors explain roughly half of the total covariation between species, thus leading to a smaller number of latent variables and increased sparsity in the loading matrix compared to a model without any covariates
Untangling direct species associations from indirect mediator species effects with graphical models
Ecologists often investigate coâoccurrence patterns in multiâspecies data in order to gain insight into the ecological causes of observed coâoccurrences. Apart from direct associations between the two species of interest, they may coâoccur because of indirect effects, where both species respond to another variable, whether environmental or biotic (e.g. a mediator species).
A wide variety of methods are now available for modelling how environmental filtering drives species distributions. In contrast, methods for studying other causes of coâoccurence are much more limited. âGraphicalâ methods, which can be used to study how mediator species impact coâoccurrence patterns, have recently been proposed for use in ecology. However, available methods are limited to presence/absence data or methods assuming multivariate normality, which is problematic when analysing abundances.
We propose Gaussian copula graphical models (GCGMs) for studying the effect of mediator species on coâoccurence patterns. GCGMs are a flexible type of graphical model which naturally accommodates all data types, for example binary (presence/absence), counts, as well as ordinal data and biomass, in a unified framework. Simulations demonstrate that GCGMs can be applied to a much broader range of data types than the methods currently used in ecology, and perform as well as or better than existing methods in many settings.
We apply GCGMs to counts of hunting spiders, in order to visualise associations between species. We also analyse abundance data of New Zealand native forest cover (on an ordinal scale) to show how GCGMs can be used analyse large and complex datasets. In these data, we were able to reproduce known species relationships as well as generate new ecological hypotheses about species associations.F.K.C.H. is supported by an ANU crossâdisciplinary research grant. D.I.W. was supported by an Australian Research Council Future Fellowship (FT120100501). G.C.P. was supported by the Australia Postgraduate Award and ARC Discovery Project scheme (DP180103543). A.T.M. is supported by an Australia Research Council Discovery Grant (DP180100836). F.J.T. is supported from the Marsden FastâStart Fund and the Royal Society of New Zealand
Recommended from our members
A climate of uncertainty: accounting for error in climate variables for species distribution models
1. Spatial climate variables are routinely used in species distribution models (SDMs) without accounting for the fact that they have been predicted with uncertainty, which can lead to biased estimates, erroneous inference and poor performances when predicting to new settings â for example under climate change scenarios. 2. We show how information on uncertainty associated with spatial climate variables can be obtained from climate data models. We then explain different types of uncertainty (i.e. classical and Berkson error) and use two statistical methods that incorporate uncertainty in climate variables into SDMs by means of (i) hierarchical modelling and (ii) simulationâextrapolation. 3. We used simulation to study the consequences of failure to account for measurement error. When uncertainty in explanatory variables was not accounted for, we found that coefficient estimates were biased and the SDM had a loss of statistical power. Further, this bias led to biased predictions when projecting change in distribution under climate change scenarios. The proposed errors-in-variables methods were less sensitive to these issues. 4. We also fit the proposed models to real data (presence/absence data on the Carolina wren, Thryothorus ludovicianus), as a function of temperature variables. 5. The proposed framework allows for many possible extensions and improvements to SDMs. If information on the uncertainty of spatial climate variables is available to researchers, we recommend the following: (i) first identify the type of uncertainty; (ii) consider whether any spatial autocorrelation or independence assumptions are required; and (iii) attempt to incorporate the uncertainty into the SDM through established statistical methods and their extensions.This is the publisherâs final pdf. The published article is copyrighted by the author(s) and published by John Wiley & Sons Ltd on behalf of the British Ecological Society. The published article can be found at: http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%292041-210X.Keywords: Measurement error, Errors-in-variables, Hierarchical statistical models, Climate maps, SIMEX, Prediction error, PRIS
- âŠ