86 research outputs found
A new method for faster and more accurate inference of species associations from big community data
1. Joint Species Distribution models (JSDMs) explain spatial variation in
community composition by contributions of the environment, biotic associations,
and possibly spatially structured residual covariance. They show great promise
as a general analytical framework for community ecology and macroecology, but
current JSDMs, even when approximated by latent variables, scale poorly on
large datasets, limiting their usefulness for currently emerging big (e.g.,
metabarcoding and metagenomics) community datasets. 2. Here, we present a
novel, more scalable JSDM (sjSDM) that circumvents the need to use latent
variables by using a Monte-Carlo integration of the joint JSDM likelihood and
allows flexible elastic net regularization on all model components. We
implemented sjSDM in PyTorch, a modern machine learning framework that can make
use of CPU and GPU calculations. Using simulated communities with known
species-species associations and different number of species and sites, we
compare sjSDM with state-of-the-art JSDM implementations to determine
computational runtimes and accuracy of the inferred species-species and
species-environmental associations. 3. We find that sjSDM is orders of
magnitude faster than existing JSDM algorithms (even when run on the CPU) and
can be scaled to very large datasets. Despite the dramatically improved speed,
sjSDM produces more accurate estimates of species association structures than
alternative JSDM implementations. We demonstrate the applicability of sjSDM to
big community data using eDNA case study with thousands of fungi operational
taxonomic units (OTU). 4. Our sjSDM approach makes the analysis of JSDMs to
large community datasets with hundreds or thousands of species possible,
substantially extending the applicability of JSDMs in ecology. We provide our
method in an R package to facilitate its applicability for practical data
analysis.Comment: 65 pages, 5 figure
cito: An R package for training neural networks using torch
Deep Neural Networks (DNN) have become a central method for regression and
classification tasks. Some packages exist that allow to fit DNN directly in R,
but those are rather limited in their functionality. Most current deep learning
applications rely on one of the major deep learning frameworks, in particular
PyTorch or TensorFlow, to build and train DNNs. Using these frameworks,
however, requires substantially more training and time than typical regression
or machine learning functions in the R environment. Here, we present 'cito', a
user-friendly R package for deep learning that allows to specify deep neural
networks in the familiar formula syntax used in many R packages. To fit the
models, 'cito' uses 'torch', taking advantage of the numerically optimized
torch library, including the ability to switch between training models on CPUs
or GPUs. Moreover, 'cito' includes many user-friendly functions for model
plotting and analysis, including optional confidence intervals (CIs) based on
bootstraps on predictions as well as explainable AI (xAI) metrics for effect
sizes and variable importance with CIs and p-values. To showcase a typical
analysis pipeline using 'cito', including its built-in xAI features to explore
the trained DNN, we build a species distribution model of the African elephant.
We hope that by providing a user-friendly R framework to specify, deploy and
interpret deep neural networks, 'cito' will make this interesting model class
more accessible to ecological data analysis. A stable version of 'cito' can be
installed from the comprehensive R archive network (CRAN).Comment: 15 pages, 4 figures, 2 table
Machine learning algorithms to infer trait-matching and predict species interactions in ecological networks
Ecologists have long suspected that species are more likely to interact if their traits match in a particular way. For example, a pollination interaction may be more likely if the proportions of a bee's tongue fit a plant's flower shape. Empirical estimates of the importance of traitâmatching for determining species interactions, however, vary significantly among different types of ecological networks.
Here, we show that ambiguity among empirical traitâmatching studies may have arisen at least in parts from using overly simple statistical models. Using simulated and real data, we contrast conventional generalized linear models (GLM) with more flexible Machine Learning (ML) models (Random Forest, Boosted Regression Trees, Deep Neural Networks, Convolutional Neural Networks, Support Vector Machines, naĂŻve Bayes, and kâNearestâNeighbor), testing their ability to predict species interactions based on traits, and infer trait combinations causally responsible for species interactions.
We found that the best ML models can successfully predict species interactions in plantâpollinator networks, outperforming GLMs by a substantial margin. Our results also demonstrate that ML models can better identify the causally responsible traitâmatching combinations than GLMs. In two case studies, the best ML models successfully predicted species interactions in a global plantâpollinator database and inferred ecologically plausible traitâmatching rules for a plantâhummingbird network from Costa Rica, without any prior assumptions about the system.
We conclude that flexible ML models offer many advantages over traditional regression models for understanding interaction networks. We anticipate that these results extrapolate to other ecological network types. More generally, our results highlight the potential of machine learning and artificial intelligence for inference in ecology, beyond standard tasks such as image or pattern recognition
The FinderApp WiTTFind for Wittgensteinâs Nachlass
Since 2010, the Wittgenstein Archives at the University Bergen (WAB, Alois Pichler) and the Centre for Information and Language Processing at the Ludwig-Maximilians University Munich (CIS, Max Hadersbeck et al.) cooperate in the research group the âWittgenstein Advanced Search Toolsâ (WAST). The WAST research group develops the web-frontend FinderApp WiTTFind together with specialized search tools for scholars in the humanities to investigate WABâs transcriptions of the Nachlass of Ludwig Wittgenstein with advanced computational search tools. Their FinderApp WiTTFind (http://wittfind.cis.lmu.de) displays facsimile-extracts on the hit-page and allows double-sided paging through the facsimile with its WiTTReader Application. In our paper, we want to present the research work around the FinderApp WiTTFind, the WiTTReader, and our latest developments within WAST, the synonym-lexicon and the similarity search tools
âcito': an R package for training neural networks using âtorch'
Deep neural networks (DNN) have become a central method in ecology. To build and train DNNs in deep learning (DL) applications, most users rely on one of the major deep learning frameworks, in particular PyTorch or TensorFlow. Using these frameworks, however, requires substantial experience and time. Here, we present âcito', a user-friendly R package for DL that allows specifying DNNs in the familiar formula syntax used by many R packages. To fit the models, âcito' takes advantage of the numerically optimized âtorch' library, including the ability to switch between training models on the CPU or the graphics processing unit (GPU) which allows the efficient training of large DNNs. Moreover, âcito' includes many user-friendly functions for model plotting and analysis, including explainable AI (xAI) metrics for effect sizes and variable importance. All xAI metrics as well as predictions can optionally be bootstrapped to generate confidence intervals, including p-values. To showcase a typical analysis pipeline using âcito', with its built-in xAI features, we built a species distribution model of the African elephant. We hope that by providing a user-friendly R framework to specify, deploy and interpret DNNs, âcito' will make this interesting class of models more accessible to ecological data analysis. A stable version of âcito' can be installed from the comprehensive R archive network (CRAN)
Fixed or random? On the reliability of mixedâeffects models for a small number of levels in grouping variables
Biological data are often intrinsically hierarchical (e.g., species from different genera, plants within different mountain regions), which made mixed-effects models a common analysis tool in ecology and evolution because they can account for the non-independence. Many questions around their practical applications are solved but one is still debated: Should we treat a grouping variable with a low number of levels as a random or fixed effect? In such situations, the variance estimate of the random effect can be imprecise, but it is unknown if this affects statistical power and type I error rates of the fixed effects of interest. Here, we analyzed the consequences of treating a grouping variable with 2â8 levels as fixed or random effect in correctly specified and alternative models (under- or overparametrized models). We calculated type I error rates and statistical power for all-model specifications and quantified the influences of study design on these quantities. We found no influence of model choice on type I error rate and power on the population-level effect (slope) for random intercept-only models. However, with varying intercepts and slopes in the data-generating process, using a random slope and intercept model, and switching to a fixed-effects model, in case of a singular fit, avoids overconfidence in the results. Additionally, the number and difference between levels strongly influences power and type I error. We conclude that inferring the correct random-effect structure is of great importance to obtain correct type I error rates. We encourage to start with a mixed-effects model independent of the number of levels in the grouping variable and switch to a fixed-effects model only in case of a singular fit. With these recommendations, we allow for more informative choices about study design and data analysis and make ecological inference with mixed-effects models more robust for small number of levels
Machineâlearning algorithms predict soil seed bank persistence from easily available traits
Question
Soil seed banks (SSB), i.e. pools of viable seeds in the soil and on its surface, play a crucial role in plant biology and ecology. Information on seed persistence in soil is of great importance for fundamental and applied research, yet compiling data sets on this trait still requires enormous efforts. We asked whether the machine-learning (ML) approach could be used to infer and predict SSB properties of a regional flora based on easily available data.
Location
Eighteen calcareous grasslands located along an elevational gradient of almost 2000 m in the Bavarian Alps, Germany.
Methods
We compared a commonly used ML model (random forest) with a conventional model (linear regression model) as to their ability to predict SSB presence/absence and density using empirical data on SSB characteristics (environmental, seed traits and phylogenetic predictors). Further, we identified the most important determinants of seed persistence in soil for predicting qualitative and quantitative SSB characteristics using the ML approach.
Results
We demonstrated that the ML model predicts SSB characteristics significantly better than the linear regression model. A single set of predictors (either environment, or seed traits, or phylogenetic eigenvectors) was sufficient for the ML model to achieve high performance in predicting SSB characteristics. Importantly, we established that a few widely available SSB predictors can achieve high predictive power in the ML approach, suggesting a high flexibility of the developed approach for use in various study systems.
Conclusions
Our study provides a novel methodological approach that combines empirical knowledge on the determinants of SSB characteristics with a modern, flexible statistical approach based on ML. It clearly demonstrates that ML can be developed into a key tool to facilitate labor-intensive, costly and time-consuming functional trait research
Ohmic heating - a novel approach for gluten-free bread baking
Gluten-free (GF) batters usually present several technological challenges that limit the performance during conventional baking and the resulting product quality. Due to the volumetric heating principle and faster heating rates, ohmic heating (OH) may be advantageous compared with conventional baking. Therefore, the potential of using ohmic heating as a novel approach for gluten-free bread baking was explored. In detail, the effect of different OH process parameters (power input, holding time) on the chemical and functional properties (specific volume, crumb firmness and relative elasticity, pore properties, color, starch gelatinization) and digestibility of breads was investigated. Results showed that GF breads could benefit from the uniform rapid heating during processing, as these breads showed superior functional properties (specific volume, 2.86-3.44 cm3/g; relative elasticity, 45.05-56.83%; porosity, 35.17-40.92%) compared with conventional oven-baked GF bread (specific volume, 2.60 cm3/g; relative elasticity, 44.23%; porosity, 37.63%). In order to maximize bread expansion and the OH performance, it was found that the OH process could be improved by applying the electrical energy in three descending power steps: first step with high power input (in this study, 2â6 kW for 15 s), followed by 1 kW for 10 s, and 0.3 kW for 1â30 min. In total, ohmic baking only needed a few minutes to obtain a fully expanded GF bread. The determination of pasting properties and starch digestibility demonstrated that these breads were comparable or even superior to GF breads baked in a conventional baking oven
- âŠ