105 research outputs found

    Poisson point process models solve the "pseudo-absence problem" for presence-only data in ecology

    Full text link
    Presence-only data, point locations where a species has been recorded as being present, are often used in modeling the distribution of a species as a function of a set of explanatory variables---whether to map species occurrence, to understand its association with the environment, or to predict its response to environmental change. Currently, ecologists most commonly analyze presence-only data by adding randomly chosen "pseudo-absences" to the data such that it can be analyzed using logistic regression, an approach which has weaknesses in model specification, in interpretation, and in implementation. To address these issues, we propose Poisson point process modeling of the intensity of presences. We also derive a link between the proposed approach and logistic regression---specifically, we show that as the number of pseudo-absences increases (in a regular or uniform random arrangement), logistic regression slope parameters and their standard errors converge to those of the corresponding Poisson point process model. We discuss the practical implications of these results. In particular, point process modeling offers a framework for choice of the number and location of pseudo-absences, both of which are currently chosen by ad hoc and sometimes ineffective methods in ecology, a point which we illustrate by example.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS331 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A general algorithm for covariance modeling of discrete data

    Get PDF
    We propose an algorithm that generalizes to discrete data any given covariance modeling algorithm originally intended for Gaussian responses, via a Gaussian copula approach. Covariance modeling is a powerful tool for extracting meaning from multivariate data, and fast algorithms for Gaussian data, such as factor analysis and Gaussian graphical models, are widely available. Our algorithm makes these tools generally available to analysts of discrete data and can combine any likelihood-based covariance modeling method for Gaussian data with any set of discrete marginal distributions. Previously, tools for discrete data were generally specific to one family of distributions or covariance modeling paradigm, or otherwise did not exist. Our algorithm is more flexible than alternate methods, takes advantage of existing fast algorithms for Gaussian data, and simulations suggest that it outperforms competing graphical modeling and factor analysis procedures for count and binomial data. We additionally show that in a Gaussian copula graphical model with discrete margins, conditional independence relationships in the latent Gaussian variables are inherited by the discrete observations. Our method is illustrated with a graphical model and factor analysis on an overdispersed ecological count dataset of species abundances

    Thirty years of change in a benthic macroinvertebrate community of southwestern Lake Ontario after invasion by four Ponto-Caspian species

    Get PDF
    Beginning in the mid-1980s, the Laurentian Great Lakes underwent successive invasions by PontoCaspian species. We quantified major changes in the diversity and relative abundance of pre-invasion benthic macroinvertebrates at the same study site in southwestern Lake Ontario from 1983–2014. The zebra mussel Dreissena polymorpha Pallas arrived at the study site before 1991, the quagga mussel Dreissena rostriformis bugensis Andrusov and the amphipod Echinogammarus ischnus Stebbing arrived before 1999, and the Round Goby Neogobius melanostomus Pallas arrived about 2004. The macroinvertebrate community in 2014 was very different from 3 earlier communities in 1983, 1991, and 1999. In 2014, pulmonate and prosobranch snails and sphaeriid bivalves were absent, D. r. bugensis replaced D. polymorpha, E. ischnus replaced Gammarus fasciatus Say as the dominant amphipod, and a previously diverse community of benthic fish was replaced by abundant N. melanostomus. From 1983 to 1999, the relative abundance of prosobranchs and pulmonates declined 10-fold and rose 2-fold, respectively. From 1991 to 2014, the relative abundance of oligochaetes and chironomids increased 32- and 78-fold, respectively. The shifts we report probably are attributable to nutrient enrichment of the nearshore of Lake Ontario during the 1990s leading to a thick carpet of macroalgae, a change in the base of the benthic food web from dressenid feces and pseudofeces to macroalgal detritus, and predation by N. melanostomus on snails

    Order selection and sparsity in latent variable models via the ordered factor LASSO

    Get PDF
    Generalized linear latent variable models (GLLVMs) offer a general framework for flexibly analyzing data involving multiple responses. When fitting such models, two of the major challenges are selecting the order, that is, the number of factors, and an appropriate structure for the loading matrix, typically a sparse structure. Motivated by the application of GLLVMs to study marine species assemblages in the Southern Ocean, we propose the Ordered Factor LASSO or OFAL penalty for order selection and achieving sparsity in GLLVMs. The OFAL penalty is the first penalty developed specifically for order selection in latent variable models, and achieves this by using a hierarchically structured group LASSO type penalty to shrink entire columns of the loading matrix to zero, while ensuring that non‐zero loadings are concentrated on the lower‐order factors. Simultaneously, individual element sparsity is achieved through the use of an adaptive LASSO. In conjunction with using an information criterion which promotes aggressive shrinkage, simulation shows that the OFAL penalty performs strongly compared with standard methods and penalties for order selection, achieving sparsity, and prediction in GLLVMs. Applying the OFAL penalty to the Southern Ocean marine species dataset suggests the available environmental predictors explain roughly half of the total covariation between species, thus leading to a smaller number of latent variables and increased sparsity in the loading matrix compared to a model without any covariates

    Untangling direct species associations from indirect mediator species effects with graphical models

    Get PDF
    Ecologists often investigate co‐occurrence patterns in multi‐species data in order to gain insight into the ecological causes of observed co‐occurrences. Apart from direct associations between the two species of interest, they may co‐occur because of indirect effects, where both species respond to another variable, whether environmental or biotic (e.g. a mediator species). A wide variety of methods are now available for modelling how environmental filtering drives species distributions. In contrast, methods for studying other causes of co‐occurence are much more limited. “Graphical” methods, which can be used to study how mediator species impact co‐occurrence patterns, have recently been proposed for use in ecology. However, available methods are limited to presence/absence data or methods assuming multivariate normality, which is problematic when analysing abundances. We propose Gaussian copula graphical models (GCGMs) for studying the effect of mediator species on co‐occurence patterns. GCGMs are a flexible type of graphical model which naturally accommodates all data types, for example binary (presence/absence), counts, as well as ordinal data and biomass, in a unified framework. Simulations demonstrate that GCGMs can be applied to a much broader range of data types than the methods currently used in ecology, and perform as well as or better than existing methods in many settings. We apply GCGMs to counts of hunting spiders, in order to visualise associations between species. We also analyse abundance data of New Zealand native forest cover (on an ordinal scale) to show how GCGMs can be used analyse large and complex datasets. In these data, we were able to reproduce known species relationships as well as generate new ecological hypotheses about species associations.F.K.C.H. is supported by an ANU cross‐disciplinary research grant. D.I.W. was supported by an Australian Research Council Future Fellowship (FT120100501). G.C.P. was supported by the Australia Postgraduate Award and ARC Discovery Project scheme (DP180103543). A.T.M. is supported by an Australia Research Council Discovery Grant (DP180100836). F.J.T. is supported from the Marsden Fast‐Start Fund and the Royal Society of New Zealand
    • 

    corecore