86 research outputs found

    A new method for faster and more accurate inference of species associations from big community data

    Full text link
    1. Joint Species Distribution models (JSDMs) explain spatial variation in community composition by contributions of the environment, biotic associations, and possibly spatially structured residual covariance. They show great promise as a general analytical framework for community ecology and macroecology, but current JSDMs, even when approximated by latent variables, scale poorly on large datasets, limiting their usefulness for currently emerging big (e.g., metabarcoding and metagenomics) community datasets. 2. Here, we present a novel, more scalable JSDM (sjSDM) that circumvents the need to use latent variables by using a Monte-Carlo integration of the joint JSDM likelihood and allows flexible elastic net regularization on all model components. We implemented sjSDM in PyTorch, a modern machine learning framework that can make use of CPU and GPU calculations. Using simulated communities with known species-species associations and different number of species and sites, we compare sjSDM with state-of-the-art JSDM implementations to determine computational runtimes and accuracy of the inferred species-species and species-environmental associations. 3. We find that sjSDM is orders of magnitude faster than existing JSDM algorithms (even when run on the CPU) and can be scaled to very large datasets. Despite the dramatically improved speed, sjSDM produces more accurate estimates of species association structures than alternative JSDM implementations. We demonstrate the applicability of sjSDM to big community data using eDNA case study with thousands of fungi operational taxonomic units (OTU). 4. Our sjSDM approach makes the analysis of JSDMs to large community datasets with hundreds or thousands of species possible, substantially extending the applicability of JSDMs in ecology. We provide our method in an R package to facilitate its applicability for practical data analysis.Comment: 65 pages, 5 figure

    cito: An R package for training neural networks using torch

    Full text link
    Deep Neural Networks (DNN) have become a central method for regression and classification tasks. Some packages exist that allow to fit DNN directly in R, but those are rather limited in their functionality. Most current deep learning applications rely on one of the major deep learning frameworks, in particular PyTorch or TensorFlow, to build and train DNNs. Using these frameworks, however, requires substantially more training and time than typical regression or machine learning functions in the R environment. Here, we present 'cito', a user-friendly R package for deep learning that allows to specify deep neural networks in the familiar formula syntax used in many R packages. To fit the models, 'cito' uses 'torch', taking advantage of the numerically optimized torch library, including the ability to switch between training models on CPUs or GPUs. Moreover, 'cito' includes many user-friendly functions for model plotting and analysis, including optional confidence intervals (CIs) based on bootstraps on predictions as well as explainable AI (xAI) metrics for effect sizes and variable importance with CIs and p-values. To showcase a typical analysis pipeline using 'cito', including its built-in xAI features to explore the trained DNN, we build a species distribution model of the African elephant. We hope that by providing a user-friendly R framework to specify, deploy and interpret deep neural networks, 'cito' will make this interesting model class more accessible to ecological data analysis. A stable version of 'cito' can be installed from the comprehensive R archive network (CRAN).Comment: 15 pages, 4 figures, 2 table

    Machine learning algorithms to infer trait-matching and predict species interactions in ecological networks

    Get PDF
    Ecologists have long suspected that species are more likely to interact if their traits match in a particular way. For example, a pollination interaction may be more likely if the proportions of a bee's tongue fit a plant's flower shape. Empirical estimates of the importance of trait‐matching for determining species interactions, however, vary significantly among different types of ecological networks. Here, we show that ambiguity among empirical trait‐matching studies may have arisen at least in parts from using overly simple statistical models. Using simulated and real data, we contrast conventional generalized linear models (GLM) with more flexible Machine Learning (ML) models (Random Forest, Boosted Regression Trees, Deep Neural Networks, Convolutional Neural Networks, Support Vector Machines, naïve Bayes, and k‐Nearest‐Neighbor), testing their ability to predict species interactions based on traits, and infer trait combinations causally responsible for species interactions. We found that the best ML models can successfully predict species interactions in plant–pollinator networks, outperforming GLMs by a substantial margin. Our results also demonstrate that ML models can better identify the causally responsible trait‐matching combinations than GLMs. In two case studies, the best ML models successfully predicted species interactions in a global plant–pollinator database and inferred ecologically plausible trait‐matching rules for a plant–hummingbird network from Costa Rica, without any prior assumptions about the system. We conclude that flexible ML models offer many advantages over traditional regression models for understanding interaction networks. We anticipate that these results extrapolate to other ecological network types. More generally, our results highlight the potential of machine learning and artificial intelligence for inference in ecology, beyond standard tasks such as image or pattern recognition

    The FinderApp WiTTFind for Wittgenstein’s Nachlass

    Get PDF
    Since 2010, the Wittgenstein Archives at the University Bergen (WAB, Alois Pichler) and the Centre for Information and Language Processing at the Ludwig-Maximilians University Munich (CIS, Max Hadersbeck et al.) cooperate in the research group the “Wittgenstein Advanced Search Tools” (WAST). The WAST research group develops the web-frontend FinderApp WiTTFind together with specialized search tools for scholars in the humanities to investigate WAB’s transcriptions of the Nachlass of Ludwig Wittgenstein with advanced computational search tools. Their FinderApp WiTTFind (http://wittfind.cis.lmu.de) displays facsimile-extracts on the hit-page and allows double-sided paging through the facsimile with its WiTTReader Application. In our paper, we want to present the research work around the FinderApp WiTTFind, the WiTTReader, and our latest developments within WAST, the synonym-lexicon and the similarity search tools

    ‘cito': an R package for training neural networks using ‘torch'

    Get PDF
    Deep neural networks (DNN) have become a central method in ecology. To build and train DNNs in deep learning (DL) applications, most users rely on one of the major deep learning frameworks, in particular PyTorch or TensorFlow. Using these frameworks, however, requires substantial experience and time. Here, we present ‘cito', a user-friendly R package for DL that allows specifying DNNs in the familiar formula syntax used by many R packages. To fit the models, ‘cito' takes advantage of the numerically optimized ‘torch' library, including the ability to switch between training models on the CPU or the graphics processing unit (GPU) which allows the efficient training of large DNNs. Moreover, ‘cito' includes many user-friendly functions for model plotting and analysis, including explainable AI (xAI) metrics for effect sizes and variable importance. All xAI metrics as well as predictions can optionally be bootstrapped to generate confidence intervals, including p-values. To showcase a typical analysis pipeline using ‘cito', with its built-in xAI features, we built a species distribution model of the African elephant. We hope that by providing a user-friendly R framework to specify, deploy and interpret DNNs, ‘cito' will make this interesting class of models more accessible to ecological data analysis. A stable version of ‘cito' can be installed from the comprehensive R archive network (CRAN)

    Fixed or random? On the reliability of mixed‐effects models for a small number of levels in grouping variables

    Get PDF
    Biological data are often intrinsically hierarchical (e.g., species from different genera, plants within different mountain regions), which made mixed-effects models a common analysis tool in ecology and evolution because they can account for the non-independence. Many questions around their practical applications are solved but one is still debated: Should we treat a grouping variable with a low number of levels as a random or fixed effect? In such situations, the variance estimate of the random effect can be imprecise, but it is unknown if this affects statistical power and type I error rates of the fixed effects of interest. Here, we analyzed the consequences of treating a grouping variable with 2–8 levels as fixed or random effect in correctly specified and alternative models (under- or overparametrized models). We calculated type I error rates and statistical power for all-model specifications and quantified the influences of study design on these quantities. We found no influence of model choice on type I error rate and power on the population-level effect (slope) for random intercept-only models. However, with varying intercepts and slopes in the data-generating process, using a random slope and intercept model, and switching to a fixed-effects model, in case of a singular fit, avoids overconfidence in the results. Additionally, the number and difference between levels strongly influences power and type I error. We conclude that inferring the correct random-effect structure is of great importance to obtain correct type I error rates. We encourage to start with a mixed-effects model independent of the number of levels in the grouping variable and switch to a fixed-effects model only in case of a singular fit. With these recommendations, we allow for more informative choices about study design and data analysis and make ecological inference with mixed-effects models more robust for small number of levels

    Machine‐learning algorithms predict soil seed bank persistence from easily available traits

    Get PDF
    Question Soil seed banks (SSB), i.e. pools of viable seeds in the soil and on its surface, play a crucial role in plant biology and ecology. Information on seed persistence in soil is of great importance for fundamental and applied research, yet compiling data sets on this trait still requires enormous efforts. We asked whether the machine-learning (ML) approach could be used to infer and predict SSB properties of a regional flora based on easily available data. Location Eighteen calcareous grasslands located along an elevational gradient of almost 2000 m in the Bavarian Alps, Germany. Methods We compared a commonly used ML model (random forest) with a conventional model (linear regression model) as to their ability to predict SSB presence/absence and density using empirical data on SSB characteristics (environmental, seed traits and phylogenetic predictors). Further, we identified the most important determinants of seed persistence in soil for predicting qualitative and quantitative SSB characteristics using the ML approach. Results We demonstrated that the ML model predicts SSB characteristics significantly better than the linear regression model. A single set of predictors (either environment, or seed traits, or phylogenetic eigenvectors) was sufficient for the ML model to achieve high performance in predicting SSB characteristics. Importantly, we established that a few widely available SSB predictors can achieve high predictive power in the ML approach, suggesting a high flexibility of the developed approach for use in various study systems. Conclusions Our study provides a novel methodological approach that combines empirical knowledge on the determinants of SSB characteristics with a modern, flexible statistical approach based on ML. It clearly demonstrates that ML can be developed into a key tool to facilitate labor-intensive, costly and time-consuming functional trait research

    Ohmic heating - a novel approach for gluten-free bread baking

    Get PDF
    Gluten-free (GF) batters usually present several technological challenges that limit the performance during conventional baking and the resulting product quality. Due to the volumetric heating principle and faster heating rates, ohmic heating (OH) may be advantageous compared with conventional baking. Therefore, the potential of using ohmic heating as a novel approach for gluten-free bread baking was explored. In detail, the effect of different OH process parameters (power input, holding time) on the chemical and functional properties (specific volume, crumb firmness and relative elasticity, pore properties, color, starch gelatinization) and digestibility of breads was investigated. Results showed that GF breads could benefit from the uniform rapid heating during processing, as these breads showed superior functional properties (specific volume, 2.86-3.44 cm3/g; relative elasticity, 45.05-56.83%; porosity, 35.17-40.92%) compared with conventional oven-baked GF bread (specific volume, 2.60 cm3/g; relative elasticity, 44.23%; porosity, 37.63%). In order to maximize bread expansion and the OH performance, it was found that the OH process could be improved by applying the electrical energy in three descending power steps: first step with high power input (in this study, 2–6 kW for 15 s), followed by 1 kW for 10 s, and 0.3 kW for 1–30 min. In total, ohmic baking only needed a few minutes to obtain a fully expanded GF bread. The determination of pasting properties and starch digestibility demonstrated that these breads were comparable or even superior to GF breads baked in a conventional baking oven
    • 

    corecore