Search CORE

86 research outputs found

A new method for faster and more accurate inference of species associations from big community data

Author: Hartig Florian
Pichler Maximilian
Publication venue
Publication date: 02/07/2021
Field of study

1. Joint Species Distribution models (JSDMs) explain spatial variation in community composition by contributions of the environment, biotic associations, and possibly spatially structured residual covariance. They show great promise as a general analytical framework for community ecology and macroecology, but current JSDMs, even when approximated by latent variables, scale poorly on large datasets, limiting their usefulness for currently emerging big (e.g., metabarcoding and metagenomics) community datasets. 2. Here, we present a novel, more scalable JSDM (sjSDM) that circumvents the need to use latent variables by using a Monte-Carlo integration of the joint JSDM likelihood and allows flexible elastic net regularization on all model components. We implemented sjSDM in PyTorch, a modern machine learning framework that can make use of CPU and GPU calculations. Using simulated communities with known species-species associations and different number of species and sites, we compare sjSDM with state-of-the-art JSDM implementations to determine computational runtimes and accuracy of the inferred species-species and species-environmental associations. 3. We find that sjSDM is orders of magnitude faster than existing JSDM algorithms (even when run on the CPU) and can be scaled to very large datasets. Despite the dramatically improved speed, sjSDM produces more accurate estimates of species association structures than alternative JSDM implementations. We demonstrate the applicability of sjSDM to big community data using eDNA case study with thousands of fungi operational taxonomic units (OTU). 4. Our sjSDM approach makes the analysis of JSDMs to large community datasets with hundreds or thousands of species possible, substantially extending the applicability of JSDMs in ecology. We provide our method in an R package to facilitate its applicability for practical data analysis.Comment: 65 pages, 5 figure

arXiv.org e-Print Archive

cito: An R package for training neural networks using torch

Author: Amesoeder Christian
Hartig Florian
Pichler Maximilian
Publication venue
Publication date: 08/10/2023
Field of study

Deep Neural Networks (DNN) have become a central method for regression and classification tasks. Some packages exist that allow to fit DNN directly in R, but those are rather limited in their functionality. Most current deep learning applications rely on one of the major deep learning frameworks, in particular PyTorch or TensorFlow, to build and train DNNs. Using these frameworks, however, requires substantially more training and time than typical regression or machine learning functions in the R environment. Here, we present 'cito', a user-friendly R package for deep learning that allows to specify deep neural networks in the familiar formula syntax used in many R packages. To fit the models, 'cito' uses 'torch', taking advantage of the numerically optimized torch library, including the ability to switch between training models on CPUs or GPUs. Moreover, 'cito' includes many user-friendly functions for model plotting and analysis, including optional confidence intervals (CIs) based on bootstraps on predictions as well as explainable AI (xAI) metrics for effect sizes and variable importance with CIs and p-values. To showcase a typical analysis pipeline using 'cito', including its built-in xAI features to explore the trained DNN, we build a species distribution model of the African elephant. We hope that by providing a user-friendly R framework to specify, deploy and interpret deep neural networks, 'cito' will make this interesting model class more accessible to ecological data analysis. A stable version of 'cito' can be installed from the comprehensive R archive network (CRAN).Comment: 15 pages, 4 figures, 2 table

arXiv.org e-Print Archive

Machine learning algorithms to infer trait-matching and predict species interactions in ecological networks

Author: Boreux V.
Hartig Florian
Klein A.-M.
Pichler Maximilian
Publication venue: 'Wiley'
Publication date: 01/01/2020
Field of study

Ecologists have long suspected that species are more likely to interact if their traits match in a particular way. For example, a pollination interaction may be more likely if the proportions of a bee's tongue fit a plant's flower shape. Empirical estimates of the importance of trait‐matching for determining species interactions, however, vary significantly among different types of ecological networks. Here, we show that ambiguity among empirical trait‐matching studies may have arisen at least in parts from using overly simple statistical models. Using simulated and real data, we contrast conventional generalized linear models (GLM) with more flexible Machine Learning (ML) models (Random Forest, Boosted Regression Trees, Deep Neural Networks, Convolutional Neural Networks, Support Vector Machines, naïve Bayes, and k‐Nearest‐Neighbor), testing their ability to predict species interactions based on traits, and infer trait combinations causally responsible for species interactions. We found that the best ML models can successfully predict species interactions in plant–pollinator networks, outperforming GLMs by a substantial margin. Our results also demonstrate that ML models can better identify the causally responsible trait‐matching combinations than GLMs. In two case studies, the best ML models successfully predicted species interactions in a global plant–pollinator database and inferred ecologically plausible trait‐matching rules for a plant–hummingbird network from Costa Rica, without any prior assumptions about the system. We conclude that flexible ML models offer many advantages over traditional regression models for understanding interaction networks. We anticipate that these results extrapolate to other ecological network types. More generally, our results highlight the potential of machine learning and artificial intelligence for inference in ecology, beyond standard tasks such as image or pattern recognition

University of Regensburg Publication Server

The FinderApp WiTTFind for Wittgenstein’s Nachlass

Author: Gangopadhyay Nivedita
Hadersbeck Maximilian
Pichler Alois
Röhrer Ines
Ullrich Sabine
Publication venue
Publication date: 02/03/2020
Field of study

Since 2010, the Wittgenstein Archives at the University Bergen (WAB, Alois Pichler) and the Centre for Information and Language Processing at the Ludwig-Maximilians University Munich (CIS, Max Hadersbeck et al.) cooperate in the research group the “Wittgenstein Advanced Search Tools” (WAST). The WAST research group develops the web-frontend FinderApp WiTTFind together with specialized search tools for scholars in the humanities to investigate WAB’s transcriptions of the Nachlass of Ludwig Wittgenstein with advanced computational search tools. Their FinderApp WiTTFind (http://wittfind.cis.lmu.de) displays facsimile-extracts on the hit-page and allows double-sided paging through the facsimile with its WiTTReader Application. In our paper, we want to present the research work around the FinderApp WiTTFind, the WiTTReader, and our latest developments within WAST, the synonym-lexicon and the similarity search tools

KITopen

‘cito': an R package for training neural networks using ‘torch'

Author: Amesöder Christian
Hartig Florian
Pichler Maximilian
Publication venue: Wiley
Publication date: 06/05/2024
Field of study

Deep neural networks (DNN) have become a central method in ecology. To build and train DNNs in deep learning (DL) applications, most users rely on one of the major deep learning frameworks, in particular PyTorch or TensorFlow. Using these frameworks, however, requires substantial experience and time. Here, we present ‘cito', a user-friendly R package for DL that allows specifying DNNs in the familiar formula syntax used by many R packages. To fit the models, ‘cito' takes advantage of the numerically optimized ‘torch' library, including the ability to switch between training models on the CPU or the graphics processing unit (GPU) which allows the efficient training of large DNNs. Moreover, ‘cito' includes many user-friendly functions for model plotting and analysis, including explainable AI (xAI) metrics for effect sizes and variable importance. All xAI metrics as well as predictions can optionally be bootstrapped to generate confidence intervals, including p-values. To showcase a typical analysis pipeline using ‘cito', with its built-in xAI features, we built a species distribution model of the African elephant. We hope that by providing a user-friendly R framework to specify, deploy and interpret DNNs, ‘cito' will make this interesting class of models more accessible to ecological data analysis. A stable version of ‘cito' can be installed from the comprehensive R archive network (CRAN)

University of Regensburg Publication Server

Fixed or random? On the reliability of mixed‐effects models for a small number of levels in grouping variables

Author: de Souza Leite Melina
Oberpriller Johannes
Pichler Maximilian
Publication venue: 'Wiley'
Publication date: 01/07/2022
Field of study

Biological data are often intrinsically hierarchical (e.g., species from different genera, plants within different mountain regions), which made mixed-effects models a common analysis tool in ecology and evolution because they can account for the non-independence. Many questions around their practical applications are solved but one is still debated: Should we treat a grouping variable with a low number of levels as a random or fixed effect? In such situations, the variance estimate of the random effect can be imprecise, but it is unknown if this affects statistical power and type I error rates of the fixed effects of interest. Here, we analyzed the consequences of treating a grouping variable with 2–8 levels as fixed or random effect in correctly specified and alternative models (under- or overparametrized models). We calculated type I error rates and statistical power for all-model specifications and quantified the influences of study design on these quantities. We found no influence of model choice on type I error rate and power on the population-level effect (slope) for random intercept-only models. However, with varying intercepts and slopes in the data-generating process, using a random slope and intercept model, and switching to a fixed-effects model, in case of a singular fit, avoids overconfidence in the results. Additionally, the number and difference between levels strongly influences power and type I error. We conclude that inferring the correct random-effect structure is of great importance to obtain correct type I error rates. We encourage to start with a mixed-effects model independent of the number of levels in the grouping variable and switch to a fixed-effects model only in case of a singular fit. With these recommendations, we allow for more informative choices about study design and data analysis and make ecological inference with mixed-effects models more robust for small number of levels

University of Regensburg Publication Server

Directory of Open Access Journals

PubMed Central

Machine‐learning algorithms predict soil seed bank persistence from easily available traits

Author: Pichler Maximilian
Poschlod Peter
Rosbakh Sergey
Török Péter
Publication venue: 'Wiley'
Publication date: 07/04/2022
Field of study

Question Soil seed banks (SSB), i.e. pools of viable seeds in the soil and on its surface, play a crucial role in plant biology and ecology. Information on seed persistence in soil is of great importance for fundamental and applied research, yet compiling data sets on this trait still requires enormous efforts. We asked whether the machine-learning (ML) approach could be used to infer and predict SSB properties of a regional flora based on easily available data. Location Eighteen calcareous grasslands located along an elevational gradient of almost 2000 m in the Bavarian Alps, Germany. Methods We compared a commonly used ML model (random forest) with a conventional model (linear regression model) as to their ability to predict SSB presence/absence and density using empirical data on SSB characteristics (environmental, seed traits and phylogenetic predictors). Further, we identified the most important determinants of seed persistence in soil for predicting qualitative and quantitative SSB characteristics using the ML approach. Results We demonstrated that the ML model predicts SSB characteristics significantly better than the linear regression model. A single set of predictors (either environment, or seed traits, or phylogenetic eigenvectors) was sufficient for the ML model to achieve high performance in predicting SSB characteristics. Importantly, we established that a few widely available SSB predictors can achieve high predictive power in the ML approach, suggesting a high flexibility of the developed approach for use in various study systems. Conclusions Our study provides a novel methodological approach that combines empirical knowledge on the determinants of SSB characteristics with a modern, flexible statistical approach based on ML. It clearly demonstrates that ML can be developed into a key tool to facilitate labor-intensive, costly and time-consuming functional trait research

University of Regensburg Publication Server

Ohmic heating - a novel approach for gluten-free bread baking

Author: Bender Denisse
Fauster Thomas
Gratz Maximilian
Jäger Henry
Kinner Mathias
Pichler Stefanie
Schoenlechner Regine
Vogt Silvan
Wicki Beata
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/08/2019
Field of study

Gluten-free (GF) batters usually present several technological challenges that limit the performance during conventional baking and the resulting product quality. Due to the volumetric heating principle and faster heating rates, ohmic heating (OH) may be advantageous compared with conventional baking. Therefore, the potential of using ohmic heating as a novel approach for gluten-free bread baking was explored. In detail, the effect of different OH process parameters (power input, holding time) on the chemical and functional properties (specific volume, crumb firmness and relative elasticity, pore properties, color, starch gelatinization) and digestibility of breads was investigated. Results showed that GF breads could benefit from the uniform rapid heating during processing, as these breads showed superior functional properties (specific volume, 2.86-3.44 cm3/g; relative elasticity, 45.05-56.83%; porosity, 35.17-40.92%) compared with conventional oven-baked GF bread (specific volume, 2.60 cm3/g; relative elasticity, 44.23%; porosity, 37.63%). In order to maximize bread expansion and the OH performance, it was found that the OH process could be improved by applying the electrical energy in three descending power steps: first step with high power input (in this study, 2–6 kW for 15 s), followed by 1 kW for 10 s, and 0.3 kW for 1–30 min. In total, ohmic baking only needed a few minutes to obtain a fully expanded GF bread. The determination of pasting properties and starch digestibility demonstrated that these breads were comparable or even superior to GF breads baked in a conventional baking oven

ZHAW digitalcollection

Long non-coding RNA PANTR1 is associated with poor prognosis and influences angiogenesis and apoptosis in clear-cell renal cell cancer

Author: Barth Dominik A.
Bauernhofer Thomas
Foßelteder Johannes
Hutterer Georg C.
Klec Christiane
Pichler Martin
Pichler Renate
Pummer Karl
Resel Margit
Seles Maximilian
Slaby Ondrej
Svoboda Marek
Zigeuner Richard E.
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

OPUS Augsburg