Search CORE

105 research outputs found

Poisson point process models solve the "pseudo-absence problem" for presence-only data in ecology

Author: Shepherd Leah C.
Warton David I.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 15/11/2010
Field of study

Presence-only data, point locations where a species has been recorded as being present, are often used in modeling the distribution of a species as a function of a set of explanatory variables---whether to map species occurrence, to understand its association with the environment, or to predict its response to environmental change. Currently, ecologists most commonly analyze presence-only data by adding randomly chosen "pseudo-absences" to the data such that it can be analyzed using logistic regression, an approach which has weaknesses in model specification, in interpretation, and in implementation. To address these issues, we propose Poisson point process modeling of the intensity of presences. We also derive a link between the proposed approach and logistic regression---specifically, we show that as the number of pseudo-absences increases (in a regular or uniform random arrangement), logistic regression slope parameters and their standard errors converge to those of the corresponding Poisson point process model. We discuss the practical implications of these results. In particular, point process modeling offers a framework for choice of the number and location of pseudo-absences, both of which are currently chosen by ad hoc and sometimes ineffective methods in ecology, a point which we illustrate by example.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS331 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

A general algorithm for covariance modeling of discrete data

Author: Hui Francis
Popovic Gordana C.
Warton David I.
Publication venue: 'Elsevier BV'
Publication date: 01/05/2018
Field of study

We propose an algorithm that generalizes to discrete data any given covariance modeling algorithm originally intended for Gaussian responses, via a Gaussian copula approach. Covariance modeling is a powerful tool for extracting meaning from multivariate data, and fast algorithms for Gaussian data, such as factor analysis and Gaussian graphical models, are widely available. Our algorithm makes these tools generally available to analysts of discrete data and can combine any likelihood-based covariance modeling method for Gaussian data with any set of discrete marginal distributions. Previously, tools for discrete data were generally specific to one family of distributions or covariance modeling paradigm, or otherwise did not exist. Our algorithm is more flexible than alternate methods, takes advantage of existing fast algorithms for Gaussian data, and simulations suggest that it outperforms competing graphical modeling and factor analysis procedures for count and binomial data. We additionally show that in a Gaussian copula graphical model with discrete margins, conditional independence relationships in the latent Gaussian variables are inherited by the discrete observations. Our method is illustrated with a graphical model and factor analysis on an overdispersed ecological count dataset of species abundances

The Australian National University

Thirty years of change in a benthic macroinvertebrate community of southwestern Lake Ontario after invasion by four Ponto-Caspian species

Author: Bailey Barrett Katherine
Haynes James M.
Warton David I.
Publication venue: Digital Commons @Brockport
Publication date: 01/03/2017
Field of study

Beginning in the mid-1980s, the Laurentian Great Lakes underwent successive invasions by PontoCaspian species. We quantified major changes in the diversity and relative abundance of pre-invasion benthic macroinvertebrates at the same study site in southwestern Lake Ontario from 1983–2014. The zebra mussel Dreissena polymorpha Pallas arrived at the study site before 1991, the quagga mussel Dreissena rostriformis bugensis Andrusov and the amphipod Echinogammarus ischnus Stebbing arrived before 1999, and the Round Goby Neogobius melanostomus Pallas arrived about 2004. The macroinvertebrate community in 2014 was very different from 3 earlier communities in 1983, 1991, and 1999. In 2014, pulmonate and prosobranch snails and sphaeriid bivalves were absent, D. r. bugensis replaced D. polymorpha, E. ischnus replaced Gammarus fasciatus Say as the dominant amphipod, and a previously diverse community of benthic fish was replaced by abundant N. melanostomus. From 1983 to 1999, the relative abundance of prosobranchs and pulmonates declined 10-fold and rose 2-fold, respectively. From 1991 to 2014, the relative abundance of oligochaetes and chironomids increased 32- and 78-fold, respectively. The shifts we report probably are attributable to nutrient enrichment of the nearshore of Lake Ontario during the 1990s leading to a thick carpet of macroalgae, a change in the base of the benthic food web from dressenid feces and pseudofeces to macroalgal detritus, and predation by N. melanostomus on snails

The College at Brockport, State University of New York: Digital Commons @Brockport

Order selection and sparsity in latent variable models via the ordered factor LASSO

Author: Hui Francis
Tanaka Emi
Warton David I.
Publication venue: 'Wiley'
Publication date: 02/11/2020
Field of study

Generalized linear latent variable models (GLLVMs) offer a general framework for flexibly analyzing data involving multiple responses. When fitting such models, two of the major challenges are selecting the order, that is, the number of factors, and an appropriate structure for the loading matrix, typically a sparse structure. Motivated by the application of GLLVMs to study marine species assemblages in the Southern Ocean, we propose the Ordered Factor LASSO or OFAL penalty for order selection and achieving sparsity in GLLVMs. The OFAL penalty is the first penalty developed specifically for order selection in latent variable models, and achieves this by using a hierarchically structured group LASSO type penalty to shrink entire columns of the loading matrix to zero, while ensuring that non‐zero loadings are concentrated on the lower‐order factors. Simultaneously, individual element sparsity is achieved through the use of an adaptive LASSO. In conjunction with using an information criterion which promotes aggressive shrinkage, simulation shows that the OFAL penalty performs strongly compared with standard methods and penalties for order selection, achieving sparsity, and prediction in GLLVMs. Applying the OFAL penalty to the Southern Ocean marine species dataset suggests the available environmental predictors explain roughly half of the total covariation between species, thus leading to a smaller number of latent variables and increased sparsity in the loading matrix compared to a model without any covariates

The Australian National University

Untangling direct species associations from indirect mediator species effects with graphical models

Author: Hui Francis
Moles Angela T.
Popovic Gordana C.
Thomson Fiona J.
Warton David I.
Publication venue: 'Wiley'
Publication date: 02/11/2020
Field of study

Ecologists often investigate co‐occurrence patterns in multi‐species data in order to gain insight into the ecological causes of observed co‐occurrences. Apart from direct associations between the two species of interest, they may co‐occur because of indirect effects, where both species respond to another variable, whether environmental or biotic (e.g. a mediator species). A wide variety of methods are now available for modelling how environmental filtering drives species distributions. In contrast, methods for studying other causes of co‐occurence are much more limited. “Graphical” methods, which can be used to study how mediator species impact co‐occurrence patterns, have recently been proposed for use in ecology. However, available methods are limited to presence/absence data or methods assuming multivariate normality, which is problematic when analysing abundances. We propose Gaussian copula graphical models (GCGMs) for studying the effect of mediator species on co‐occurence patterns. GCGMs are a flexible type of graphical model which naturally accommodates all data types, for example binary (presence/absence), counts, as well as ordinal data and biomass, in a unified framework. Simulations demonstrate that GCGMs can be applied to a much broader range of data types than the methods currently used in ecology, and perform as well as or better than existing methods in many settings. We apply GCGMs to counts of hunting spiders, in order to visualise associations between species. We also analyse abundance data of New Zealand native forest cover (on an ordinal scale) to show how GCGMs can be used analyse large and complex datasets. In these data, we were able to reproduce known species relationships as well as generate new ecological hypotheses about species associations.F.K.C.H. is supported by an ANU cross‐disciplinary research grant. D.I.W. was supported by an Australian Research Council Future Fellowship (FT120100501). G.C.P. was supported by the Australia Postgraduate Award and ARC Discovery Project scheme (DP180103543). A.T.M. is supported by an Australia Research Council Discovery Grant (DP180100836). F.J.T. is supported from the Marsden Fast‐Start Fund and the Royal Society of New Zealand

The Australian National University

Recommended from our members

A climate of uncertainty: accounting for error in climate variables for species distribution models

Author: Ashcroft Michael B.
Daly Christopher
Foster Scott D.
Stoklosa Jakub
Warton David I.
Publication venue: John Wiley & Sons Ltd.
Publication date
Field of study

1. Spatial climate variables are routinely used in species distribution models (SDMs) without accounting for the fact that they have been predicted with uncertainty, which can lead to biased estimates, erroneous inference and poor performances when predicting to new settings – for example under climate change scenarios. 2. We show how information on uncertainty associated with spatial climate variables can be obtained from climate data models. We then explain different types of uncertainty (i.e. classical and Berkson error) and use two statistical methods that incorporate uncertainty in climate variables into SDMs by means of (i) hierarchical modelling and (ii) simulation–extrapolation. 3. We used simulation to study the consequences of failure to account for measurement error. When uncertainty in explanatory variables was not accounted for, we found that coefficient estimates were biased and the SDM had a loss of statistical power. Further, this bias led to biased predictions when projecting change in distribution under climate change scenarios. The proposed errors-in-variables methods were less sensitive to these issues. 4. We also fit the proposed models to real data (presence/absence data on the Carolina wren, Thryothorus ludovicianus), as a function of temperature variables. 5. The proposed framework allows for many possible extensions and improvements to SDMs. If information on the uncertainty of spatial climate variables is available to researchers, we recommend the following: (i) first identify the type of uncertainty; (ii) consider whether any spatial autocorrelation or independence assumptions are required; and (iii) attempt to incorporate the uncertainty into the SDM through established statistical methods and their extensions.This is the publisher’s final pdf. The published article is copyrighted by the author(s) and published by John Wiley & Sons Ltd on behalf of the British Ecological Society. The published article can be found at: http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%292041-210X.Keywords: Measurement error, Errors-in-variables, Hierarchical statistical models, Climate maps, SIMEX, Prediction error, PRIS

ScholarsArchive@OSU