5 research outputs found
A new method for faster and more accurate inference of species associations from big community data
1. Joint Species Distribution models (JSDMs) explain spatial variation in
community composition by contributions of the environment, biotic associations,
and possibly spatially structured residual covariance. They show great promise
as a general analytical framework for community ecology and macroecology, but
current JSDMs, even when approximated by latent variables, scale poorly on
large datasets, limiting their usefulness for currently emerging big (e.g.,
metabarcoding and metagenomics) community datasets. 2. Here, we present a
novel, more scalable JSDM (sjSDM) that circumvents the need to use latent
variables by using a Monte-Carlo integration of the joint JSDM likelihood and
allows flexible elastic net regularization on all model components. We
implemented sjSDM in PyTorch, a modern machine learning framework that can make
use of CPU and GPU calculations. Using simulated communities with known
species-species associations and different number of species and sites, we
compare sjSDM with state-of-the-art JSDM implementations to determine
computational runtimes and accuracy of the inferred species-species and
species-environmental associations. 3. We find that sjSDM is orders of
magnitude faster than existing JSDM algorithms (even when run on the CPU) and
can be scaled to very large datasets. Despite the dramatically improved speed,
sjSDM produces more accurate estimates of species association structures than
alternative JSDM implementations. We demonstrate the applicability of sjSDM to
big community data using eDNA case study with thousands of fungi operational
taxonomic units (OTU). 4. Our sjSDM approach makes the analysis of JSDMs to
large community datasets with hundreds or thousands of species possible,
substantially extending the applicability of JSDMs in ecology. We provide our
method in an R package to facilitate its applicability for practical data
analysis.Comment: 65 pages, 5 figure
Clustering species with residual covariance matrix in Joint Species Distribution models
International audienceModelling species distributions over space and time is one of the major research topics in both ecology and conservation biology. Joint Species Distribution models (JSDMs) have recently been introduced as a tool to better model community data, by inferring a residual covariance matrix between species, after accounting for species' response to the environment. However, these models are computationally demanding, even when latent factors, a common tool for dimension reduction, are used. To address this issue, Taylor-Rodriguez et al. (2017) proposed to use a Dirichlet process, a Bayesian nonparametric prior, to further reduce model dimension by clustering species in the residual covariance matrix. Here, we built on this approach to include a prior knowledge on the potential number of clusters, and instead used a Pitman-Yor process to address some critical limitations of the Dirichlet process. We therefore propose a framework that includes prior knowledge in the residual covariance matrix, providing a tool to analyze clusters of species that share the same residual associations with respect to other species. We applied our methodology to a case study of plant communities in a protected area of the French Alps (the Bauges Regional Park), and demonstrated that our extensions improve dimension reduction and reveal additional information from the residual covariance matrix, notably showing how the estimated clusters are compatible with plant traits, endorsing their importance in shaping communities