11 research outputs found

    Imputing missing data in plant traits: A guide to improve gap‐filling

    Full text link
    Aim: Globally distributed plant trait data are increasingly used to understand relationships between biodiversity and ecosystem processes. However, global trait databases are sparse because they are compiled from many, mostly small databases. This sparsity in both trait space completeness and geographical distribution limits the potential for both multivariate and global analyses. Thus, ‘gap-filling’ approaches are often used to impute missing trait data. Recent methods, like Bayesian hierarchical probabilistic matrix factorization (BHPMF), can impute large and sparse data sets using side information. We investigate whether BHPMF imputation leads to biases in trait space and identify aspects influencing bias to provide guidance for its usage. Innovation: We use a fully observed trait data set from which entries are randomly removed, along with extensive but sparse additional data. We use BHPMF for imputation and evaluate bias by: (1) accuracy (residuals, RMSE, trait means), (2) correlations (bi-and multivariate) and (3) taxonomic and functional clustering (valuewise, uni-and multivariate). BHPMF preserves general patterns of trait distributions but induces taxonomic clustering. Data set–external trait data had little effect on induced taxonomic clustering and stabilized trait–trait correlations. Main Conclusions: Our study extends the criteria for the evaluation of gap-filling beyond RMSE, providing insight into statistical data structure and allowing better informed use of imputed trait data, with improved practice for imputation. We expect our findings to be valuable beyond applications in plant ecology, for any study using hierarchical side information for imputation

    Imputing missing data in plant traits: A guide to improve gap‐filling

    Get PDF
    Aim: Globally distributed plant trait data are increasingly used to understand relationships between biodiversity and ecosystem processes. However, global trait databases are sparse because they are compiled from many, mostly small databases. This sparsity in both trait space completeness and geographical distribution limits the potential for both multivariate and global analyses. Thus, ‘gap‐filling’ approaches are often used to impute missing trait data. Recent methods, like Bayesian hierarchical probabilistic matrix factorization (BHPMF), can impute large and sparse data sets using side information. We investigate whether BHPMF imputation leads to biases in trait space and identify aspects influencing bias to provide guidance for its usage. Innovation: We use a fully observed trait data set from which entries are randomly removed, along with extensive but sparse additional data. We use BHPMF for imputation and evaluate bias by: (1) accuracy (residuals, RMSE, trait means), (2) correlations (bi‐ and multivariate) and (3) taxonomic and functional clustering (valuewise, uni‐ and multivariate). BHPMF preserves general patterns of trait distributions but induces taxonomic clustering. Data set–external trait data had little effect on induced taxonomic clustering and stabilized trait–trait correlations. Main Conclusions: Our study extends the criteria for the evaluation of gap‐filling beyond RMSE, providing insight into statistical data structure and allowing better informed use of imputed trait data, with improved practice for imputation. We expect our findings to be valuable beyond applications in plant ecology, for any study using hierarchical side information for imputation

    Global signals in plant traits

    Full text link

    BHPMF – a hierarchical Bayesian approach to gap-filling and trait prediction for macroecology and functional biogeography

    No full text
    Aim: Functional traits of organisms are key to understanding and predicting biodiversity and ecological change, which motivates continuous collection of traits and their integration into global databases. Such trait matrices are inherently sparse, severely limiting their usefulness for further analyses. On the other hand, traits are characterized by the phylogenetic trait signal, trait–trait correlations and environmental constraints, all of which provide information that could be used to statistically fill gaps. We propose the application of probabilistic models which, for the first time, utilize all three characteristics to fill gaps in trait databases and predict trait values at larger spatial scales. Innovation: For this purpose we introduce BHPMF, a ierarchical Bayesian extension of probabilistic matrix actorization (PMF). PMF is a machine learning technique which exploits the correlation structure of sparse matrices to impute missing entries. BHPMF additionally utilizes the taxonomic hierarchy for trait prediction and provides uncertainty estimates for each imputation. In combination with multiple regression against environmental information, BHPMF allows for extrapolation frompoint measurements to larger spatial scales.We demonstrate the applicability of BHPMF in ecological contexts, using different plant functional trait datasets, also comparing results to taking the species mean and PMF. Main conclusions: Sensitivity analyses validate the robustness and accuracy of BHPMF: our method captures the correlation structure of the trait matrix as well as the phylogenetic trait signal – also for extremely sparse trait matrices – and provides a robust measure of confidence in prediction accuracy for each missing entry. The combination of BHPMF with environmental constraints provides a promising concept to extrapolate traits beyond sampled regions, accounting for intraspecific trait variability. We conclude that BHPMF and its derivatives have a high potential to support future trait-based research in macroecology and functional biogeography

    Climatic and soil factors explain the two-dimensional spectrum of global plant trait variation

    Full text link
    Plant functional traits can predict community assembly and ecosystem functioning and are thus widely used in global models of vegetation dynamics and land–climate feedbacks. Still, we lack a global understanding of how land and climate affect plant traits. A previous global analysis of six traits observed two main axes of variation: (1) size variation at the organ and plant level and (2) leaf economics balancing leaf persistence against plant growth potential. The orthogonality of these two axes suggests they are differently influenced by environmental drivers. We find that these axes persist in a global dataset of 17 traits across more than 20,000 species. We find a dominant joint effect of climate and soil on trait variation. Additional independent climate effects are also observed across most traits, whereas independent soil effects are almost exclusively observed for economics traits. Variation in size traits correlates well with a latitudinal gradient related to water or energy limitation. In contrast, variation in economics traits is better explained by interactions of climate with soil fertility. These findings have the potential to improve our understanding of biodiversity patterns and our predictions of climate change impacts on biogeochemical cycles

    Climatic and soil factors explain the two-dimensional spectrum of global plant trait variation

    No full text
    Plant functional traits can predict community assembly and ecosystem functioning and are thus widely used in global models of vegetation dynamics and land–climate feedbacks. Still, we lack a global understanding of how land and climate affect plant traits. A previous global analysis of six traits observed two main axes of variation: (1) size variation at the organ and plant level and (2) leaf economics balancing leaf persistence against plant growth potential. The orthogonality of these two axes suggests they are differently influenced by environmental drivers. We find that these axes persist in a global dataset of 17 traits across more than 20,000 species. We find a dominant joint effect of climate and soil on trait variation. Additional independent climate effects are also observed across most traits, whereas independent soil effects are almost exclusively observed for economics traits. Variation in size traits correlates well with a latitudinal gradient related to water or energy limitation. In contrast, variation in economics traits is better explained by interactions of climate with soil fertility. These findings have the potential to improve our understanding of biodiversity patterns and our predictions of climate change impacts on biogeochemical cycles

    The global spectrum of plant form and function: enhanced species-level trait dataset.

    Get PDF
    Here we provide the 'Global Spectrum of Plant Form and Function Dataset', containing species mean values for six vascular plant traits. Together, these traits -plant height, stem specific density, leaf area, leaf mass per area, leaf nitrogen content per dry mass, and diaspore (seed or spore) mass - define the primary axes of variation in plant form and function. The dataset is based on ca. 1 million trait records received via the TRY database (representing ca. 2,500 original publications) and additional unpublished data. It provides 92,159 species mean values for the six traits, covering 46,047 species. The data are complemented by higher-level taxonomic classification and six categorical traits (woodiness, growth form, succulence, adaptation to terrestrial or aquatic habitats, nutrition type and leaf type). Data quality management is based on a probabilistic approach combined with comprehensive validation against expert knowledge and external information. Intense data acquisition and thorough quality control produced the largest and, to our knowledge, most accurate compilation of empirically observed vascular plant species mean traits to date

    The global spectrum of plant form and function:enhanced species-level trait dataset

    No full text
    corecore