4 research outputs found

    How much should one sample to accurately predict the distribution of species assemblages? A virtual community approach

    Get PDF
    Correlative species distribution models (SDMs) are widely used to predict species distributions and assemblages, with many fundamental and applied uses. Different factors were shown to affect SDM prediction accuracy. However, real data cannot give unambiguous answers on these issues, and for this reason, artificial data have been increasingly used in recent years. Here, we move one step further by assessing how different factors can affect the prediction accuracy of virtual assemblages obtained by stacking individual SDM predictions (stacked SDMs, S-SDM). We modeled 100 virtual species in a real study area, testing five different factors: sample size (200-800-3200), sampling method (nested, non-nested), sampling prevalence (25%, 50%, 75% and species true prevalence), modelling technique (GAM, GLM, BRT and RF) and thresholding method (ROC, MaxTSS, and MaxKappa). We showed that the accuracy of S-SDM predictions is mostly affected by modelling technique followed by sample size. Models fitted by GAM/GLM had a higher accuracy and lower variance than BRT/RF. Model accuracy increased with sample size and a sampling strategy reflecting the true prevalence of the species was most successful. However, even with sample sizes as high as >3000 sites, residual uncertainty remained in the predictions, potentially reflecting a bias introduced by creating and/or resampling the virtual species. Therefore, when evaluating the accuracy of predictions from S-SDMs fitted with real field data, one can hardly expect reaching perfect accuracy, and reasonably high values of similarity or predictive success can already be seen as valuable predictions. We recommend the use of a ‘plot-like’ sampling method (best approximation of the species’ true prevalence) and not simply increasing the number of presences-absences of species. As presented here, virtual simulations might be used more systematically in future studies to inform about the best accuracy level that one could expect given the characteristics of the data and the methods used to fit and stack SDMs

    How to evaluate community predictions without thresholding?

    Get PDF
    Stacked species distribution models (S-SDM) provide a tool to make spatial predictions about communities by first modelling individual species and then stacking the modelled predictions to form assemblages. The evaluation of the predictive performance is usually based on a comparison of the observed and predicted community properties (e.g. species richness, composition). However, the most available and widely used evaluation metrics require the thresholding of single species' predicted probabilities of occurrence to obtain binary outcomes (i.e. presence/absence). This binarization can introduce unnecessary bias and error. Herein, we present and demonstrate the use of several groups of new or rarely used evaluation approaches and metrics for both species richness and community composition that do not require thresholding but instead directly compare the predicted probabilities of occurrences of species to the presence/absence observations in the assemblages. Community AUC, which is based on traditional AUC, measures the ability of a model to differentiate between species presences or absences at a given site according to their predicted probabilities of occurrence. Summing the probabilities gives the expected species richness and allows the estimation of the probability that the observed species richness is not different from the expected species richness based on the species' probabilities of occurrence. The traditional Sorensen and Jaccard similarity indices (which are based on presences/absences) were adapted to maxSorensen and maxJaccard and to probSorensen and probJaccard (which use probabilities directly). A further approach (improvement over null models) compares the predictions based on S-SDMs with the expectations from the null models to estimate the improvement in both species richness and composition predictions. Additionally, all metrics can be described against the environmental conditions of sites (e.g. elevation) to highlight the abilities of models to detect the variation in the strength of the community assembly processes in different environments. These metrics offer an unbiased view of the performance of community predictions compared to metrics that requiring thresholding. As such, they allow more straightforward comparisons of model performance among studies (i.e. they are not influenced by any subjective thresholding decisions).Peer reviewe

    Ecological indicator values reveal missing predictors of species distributions.

    Get PDF
    The questions of how much abiotic environment contributes to explain species distributions, and which abiotic factors are the most influential, are key when projecting species realized niches in space and time. Here, we show that answers to these questions can be obtained by using species' ecological indicator values (EIVs). By calculating community averages of plant EIVs (397 plant species and 3988 vegetation plots), we found that substituting mapped environmental predictors with site EIVs led to a doubling of explained variation (22.5% to 44%). EIVs representing light and soil showed the highest model improvement, while EIVs representing temperature did not explain additional variance, suggesting that current temperature maps are already fairly accurate. Therefore, although temperature is frequently reported as having a dominant effect on species distributions over other factors, our results suggest that this might primarily result from limitations in our capacity to map other key environmental factors, such as light and soil properties, over large areas

    Uncertainty, errors and virtual ecology: using artificial data to improve species distribution models

    Get PDF
    With the growing pressures exerted by anthropogenic activities (e.g. land-use changes, habitat fragmentation, greenhouse gas emissions) and environmental changes (e.g. climate change, biological invasions), biodiversity is being threatened worldwide. It is therefore important to sufficiently understand which factors influence the distribution and composition of species assemblages, develop tools allowing us to accurately predict them under current and future environmental conditions. Species distribution models (SDMs) are especially useful to tackle these challenges since they allow the modelling of the distribution of species and their assemblages at different spatial and temporal scales. This is done by simply relating species observations with environmental conditions where they occur. However, different factors (e.g. sample size, modelling technique) and errors/bias (i.e. false presences/absences) were shown to affect the prediction accuracy of single species and assemblage SDMs (i.e. S-SDMs). SDMs can also provide biased projections when predicting to regions or time periods with environmental conditions outside the range of data used for model calibration (i.e. model transferability) or when that data doesn’t capture the full conditions occupied by the species (i.e. truncated datasets). While the majority of SDMs use real species data, it is important to assess their accuracy by having complete control of the data and factors influencing species distributions, hence the use of virtual or simulated species. In the first chapter of my thesis, I used virtual species data to test SDM/S-SDMs and determine the degree to which different types and levels of errors in species data (i.e. false presences or absences) affect the predictions of individual species models, and how this is reflected in metrics that are frequently used to evaluate the prediction accuracy of SDMs. I found that interpretation of models’ performance depended on the data and metrics used to evaluate them, with model performance being more affected by false positives. In the second chapter, I assessed how different factors (sample size, sampling method, sampling prevalence, modelling technique and thresholding method) affect the prediction accuracy of S-SDMs. I found that prediction accuracy is mostly affected by modelling technique followed by sample size and that a ‘plot-like’ sampling method is recommended when sampling species data (i.e. best approximation of the species’ true prevalence). In my third chapter I tested the potential causes that increasingly truncated datasets have on the predictive accuracy of species assemblages and if the variables used to calibrate the models also influence that accuracy, finding that the degree of truncation has more influence on species with wide realized niches. Finally, on my last main chapter, I tested and compared how accurate different modelling strategies are at predicting species assemblages under current and future climatic conditions, assessing their transferability. I found that when using presence/pseudo-absence data, all the strategies failed to predict accurate species assemblages, being better when presence-absence data is used (under current environmental conditions). -- La biodiversitĂ© est actuellement mondialement menacĂ©e par l’augmentation de la pression due aux activitĂ©s anthropiques (p. ex. changement dans l’utilisation du territoire, fragmentation des habitats, Ă©mission de gaz Ă  effet de serre) et aux changements environnementaux (p. ex. changements climatiques, invasions biologiques). Il est donc capital de comprendre les facteurs influençant la distribution et la composition des assemblages d’espĂšces ainsi que de dĂ©velopper des outils pour les prĂ©dire prĂ©cisĂ©ment autant dans des conditions environnementales actuelles que future. Les modĂšles prĂ©dictifs de distribution (MPDs) sont des outils particuliĂšrement utiles pour apprĂ©hender ce genre de challenges, car ils permettent de modĂ©liser la distribution des espĂšces ainsi que leurs assemblages Ă  diffĂ©rentes Ă©chelles spatiales et temporelles. Cela peut se faire en reliant des observations d’espĂšces avec les conditions environnementales dans lesquelles elles se trouvent. Cependant, il a Ă©tĂ© montrĂ© que diffĂ©rent facteurs (p. ex. taille d’échantillonnage, techniques de modĂ©lisation) et erreur/biais (c.-Ă -d. fausses prĂ©sences/absences) peuvent affecter la qualitĂ© des prĂ©dictions obtenues lors de la modĂ©lisation prĂ©dictive de la distribution de simples espĂšces (MPD) et d’assemblages (S-SDMs). Les MPDs peuvent aussi crĂ©er des projections biaisĂ©es lorsqu’ils prĂ©disent dans des rĂ©gions ou des pĂ©riodes de temps qui possĂšdent des conditions environnementales en dehors de la gamme de donnĂ©es utilisĂ©es lors de la calibration du modĂšle (c.-Ă -d. transfĂ©rabilitĂ© du modĂšle) ou quand les donnĂ©es ne reprĂ©sentent pas l’entier des conditions occupĂ©es par l’espĂšce (c.-Ă -d. jeu de donnĂ©es tronquĂ©). Bien que la majoritĂ© des MPDs utilisent des donnĂ©es d’espĂšces rĂ©elles, il est important de pouvoir Ă©valuer leurs prĂ©cisions en ayant le contrĂŽle complet des donnĂ©es ainsi que des facteurs pouvant influencer la distribution des espĂšces. Seul l’utilisation d’espĂšces virtuelles ou simulĂ©es permet d’obtenir ce contrĂŽle total. Dans le premier chapitre de ma thĂšse, j’ai utilisĂ© des donnĂ©es d’espĂšces virtuelles afin de dĂ©terminer, Ă  l’aide de MPDs/S-SDMs, dans quelle mesure diffĂ©rents types et niveaux d’erreurs dans les donnĂ©es d’espĂšces (c.-Ă -d. fausses prĂ©sences ou absences) pouvaient affecter les prĂ©dictions obtenues. J’ai aussi cherchĂ© Ă  comprendre comment cela se reflĂšte sur les mĂ©triques habituellement utilisĂ©es pour Ă©valuer la qualitĂ© des prĂ©dictions de ces MPDs. J’ai dĂ©couvert que l’interprĂ©tation des performances des modĂšles dĂ©pends des donnĂ©es et des mĂ©triques utilisĂ©es pour les Ă©valuer. Cette performance est particuliĂšrement affectĂ©e par les faux positifs. Dans le second chapitre, j’ai Ă©valuĂ© comment diffĂ©rents facteurs (taille d’échantillonnage, mĂ©thode d’échantillonnage, prĂ©valence d’échantillonnage, technique de modĂ©lisation et mĂ©thode de dĂ©finition des seuils) affectent la qualitĂ© des prĂ©dictions obtenues Ă  l’aide de S- SDMs. J’ai trouvĂ© que la qualitĂ© des prĂ©dictions est principalement affectĂ©e par les techniques de modĂ©lisation, suivie par la taille de l’échantillonnage. Une mĂ©thode d’échantillonnage dite « plot-like » est recommandĂ©e lors de la rĂ©colte de donnĂ©es (c.-Ă -d. qu’elle donne la meilleure approximation de la rĂ©elle prĂ©valence de l’espĂšce). Dans mon troisiĂšme chapitre, j’ai testĂ© quels pouvaient ĂȘtre les potentiels effets de l’utilisation de jeux de donnĂ©es de plus en plus tronquĂ©s sur la qualitĂ© des prĂ©dictions des assemblages d’espĂšces ainsi que l’influence des variables utilisĂ©es lors de la calibration. Il s’avĂšre que le degrĂ© de troncature a plus d’effet sur les espĂšces ayant une large niche rĂ©alisĂ©e. Finalement, dans mon dernier chapitre, j’ai testĂ© diffĂ©rentes stratĂ©gies de modĂ©lisation puis j’ai comparĂ© leur aptitude Ă  prĂ©dire des assemblages d’espĂšces dans des conditions prĂ©sentes et futures pour Ă©valuer leur transfĂ©rabilitĂ©. J’ai dĂ©couvert que lors de l’utilisation de donnĂ©es de prĂ©sences/pseudo-absences, toutes les stratĂ©gies Ă©chouaient Ă  prĂ©dire de maniĂšre prĂ©cise les assemblages. L’utilisation de donnĂ©es de prĂ©sence/absences a permis, quant Ă  elle, d’obtenir de meilleurs rĂ©sultats, principalement dans des conditions environnementales prĂ©sentes
    corecore