10 research outputs found

    The SoftWipe tool and benchmark for assessing coding standards adherence of scientific software

    Get PDF
    Scientific software from all areas of scientific research is pivotal to obtaining novel insights. Yet the coding standards adherence of scientific software is rarely assessed, even though it might lead to incorrect scientific results in the worst case. Therefore, we have developed an open source tool and benchmark called SoftWipe, that provides a relative software coding standards adherence ranking of 48 computational tools from diverse research areas. SoftWipe can be used in the review process of software papers and to inform the scientific software selection process

    From Easy to Hopeless - Predicting the Difficulty of Phylogenetic Analyses

    Get PDF
    Phylogenetic analyzes under the Maximum-Likelihood (ML) model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. At present, no method exists to quantify and predict this behavior. We introduce a method to quantify the degree of difficulty for analyzing a dataset and present Pythia, a Random Forest Regressor that accurately predicts this difficulty. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating ML-based tree inferences. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyzes, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard-to-analyze datasets

    RAxML Grove: an empirical phylogenetic tree database

    Get PDF
    SUMMARY: The assessment of novel phylogenetic models and inference methods is routinely being conducted via experiments on simulated as well as empirical data. When generating synthetic data it is often unclear how to set simulation parameters for the models and generate trees that appropriately reflect empirical model parameter distributions and tree shapes. As a solution, we present and make available a new database called ‘RAxML Grove’ currently comprising more than 60 000 inferred trees and respective model parameter estimates from fully anonymized empirical datasets that were analyzed using RAxML and RAxML-NG on two web servers. We also describe and make available two simple applications of RAxML Grove to exemplify its usage and highlight its utility for designing realistic simulation studies and analyzing empirical model parameter and tree shape distributions. AVAILABILITY AND IMPLEMENTATION: RAxML Grove is freely available at https://github.com/angtft/RAxMLGrove. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty

    Get PDF
    Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty

    The seasonal cycle of ice-nucleating particles linked to the abundance of biogenic aerosol in boreal forests

    Get PDF
    Ice-nucleating particles (INPs) trigger the formation of cloud ice crystals in the atmosphere. Therefore, they strongly influence cloud microphysical and optical properties and precipitation and the life cycle of clouds. Improving weather forecasting and climate projection requires an appropriate formulation of atmospheric INP concentrations. This remains challenging as the global INP distribution and variability depend on a variety of aerosol types and sources, and neither their short-term variability nor their long-term seasonal cycles are well covered by continuous measurements. Here, we provide the first year-long set of observations with a pronounced INP seasonal cycle in a boreal forest environment. Besides the observed seasonal cycle in INP concentrations with a minimum in wintertime and maxima in early and late summer, we also provide indications for a seasonal variation in the prevalent INP type. We show that the seasonal dependency of INP concentrations and prevalent INP types is most likely driven by the abundance of biogenic aerosol. As current parameterizations do not reproduce this variability, we suggest a new mechanistic description for boreal forest environments which considers the seasonal variation in INP concentrations. For this, we use the ambient air temperature measured close to the ground at 4.2 m height as a proxy for the season, which appears to affect the source strength of biogenic emissions and, thus, the INP abundance over the boreal forest. Furthermore, we provide new INP parameterizations based on the Ice Nucleation Active Surface Site (INAS) approach, which specifically describes the ice nucleation activity of boreal aerosols particles prevalent in different seasons. Our results characterize the boreal forest as an important but variable INP source and provide new perspectives to describe these new findings in atmospheric models.Peer reviewe

    Measurement report : Introduction to the HyICE-2018 campaign for measurements of ice-nucleating particles and instrument inter-comparison in the Hyytiala boreal forest

    Get PDF
    The formation of ice particles in Earth's atmosphere strongly influences the dynamics and optical properties of clouds and their impacts on the climate system. Ice formation in clouds is often triggered heterogeneously by ice-nucleating particles (INPs) that represent a very low number of particles in the atmosphere. To date, many sources of INPs, such as mineral and soil dust, have been investigated and identified in the low and mid latitudes. Although less is known about the sources of ice nucleation at high latitudes, efforts have been made to identify the sources of INPs in the Arctic and boreal environments. In this study, we investigate the INP emission potential from high-latitude boreal forests in the mixed-phase cloud regime. We introduce the HyICE-2018 measurement campaign conducted in the boreal forest of Hyytiala, Finland, between February and June 2018. The campaign utilized the infrastructure of the Station for Measuring Ecosystem-Atmosphere Relations (SMEAR) II, with additional INP instruments, including the Portable Ice Nucleation Chamber I and II (PINC and PINCii), the SPectrometer for Ice Nuclei (SPIN), the Portable Ice Nucleation Experiment (PINE), the Ice Nucleation SpEctrometer of the Karlsruhe Institute of Technology (INSEKT) and the Microlitre Nucleation by Immersed Particle Instrument (mu L-NIPI), used to quantify the INP concentrations and sources in the boreal environment. In this contribution, we describe the measurement infrastructure and operating procedures during HyICE-2018, and we report results from specific time periods where INP instruments were run in parallel for inter-comparison purposes. Our results show that the suite of instruments deployed during HyICE-2018 reports consistent results and therefore lays the foundation for forthcoming results to be considered holistically. In addition, we compare measured INP concentrations to INP parameterizations, and we observe good agreement with the Tobo et al. (2013) parameterization developed from measurements conducted in a ponderosa pine forest ecosystem in Colorado, USA.Peer reviewe

    Measurement report : Introduction to the HyICE-2018 campaign for measurements of ice-nucleating particles and instrument inter-comparison in the Hyytiala boreal forest

    Get PDF
    The formation of ice particles in Earth's atmosphere strongly influences the dynamics and optical properties of clouds and their impacts on the climate system. Ice formation in clouds is often triggered heterogeneously by ice-nucleating particles (INPs) that represent a very low number of particles in the atmosphere. To date, many sources of INPs, such as mineral and soil dust, have been investigated and identified in the low and mid latitudes. Although less is known about the sources of ice nucleation at high latitudes, efforts have been made to identify the sources of INPs in the Arctic and boreal environments. In this study, we investigate the INP emission potential from high-latitude boreal forests in the mixed-phase cloud regime. We introduce the HyICE-2018 measurement campaign conducted in the boreal forest of Hyytiala, Finland, between February and June 2018. The campaign utilized the infrastructure of the Station for Measuring Ecosystem-Atmosphere Relations (SMEAR) II, with additional INP instruments, including the Portable Ice Nucleation Chamber I and II (PINC and PINCii), the SPectrometer for Ice Nuclei (SPIN), the Portable Ice Nucleation Experiment (PINE), the Ice Nucleation SpEctrometer of the Karlsruhe Institute of Technology (INSEKT) and the Microlitre Nucleation by Immersed Particle Instrument (mu L-NIPI), used to quantify the INP concentrations and sources in the boreal environment. In this contribution, we describe the measurement infrastructure and operating procedures during HyICE-2018, and we report results from specific time periods where INP instruments were run in parallel for inter-comparison purposes. Our results show that the suite of instruments deployed during HyICE-2018 reports consistent results and therefore lays the foundation for forthcoming results to be considered holistically. In addition, we compare measured INP concentrations to INP parameterizations, and we observe good agreement with the Tobo et al. (2013) parameterization developed from measurements conducted in a ponderosa pine forest ecosystem in Colorado, USA.Peer reviewe

    Simulations of sequence evolution: how (un)realistic they are and why

    No full text
    Abstract Motivation Simulating Multiple Sequence Alignments (MSAs) using probabilistic models of sequence evolution plays an important role in the evaluation of phylogenetic inference tools, and is crucial to the development of novel learning-based approaches for phylogenetic reconstruction, for instance, neural networks. These models and the resulting simulated data need to be as realistic as possible to be indicative of the performance of the developed tools on empirical data and to ensure that neural networks trained on simulations perform well on empirical data. Over the years, numerous models of evolution have been published with the goal to represent as faithfully as possible the sequence evolution process and thus simulate empirical-like data. In this study, we simulated DNA and protein MSAs under increasingly complex models of evolution with and without insertion/deletion (indel) events using a state-of-the-art sequence simulator. We assessed their realism by quantifying how accurately supervised learning methods are able to predict whether a given MSA is simulated or empirical. Results Our results show that we can distinguish between empirical and simulated MSAs with high accuracy using two distinct and independently developed classification approaches across all tested models of sequence evolution. Our findings suggest that the current state-of-the-art models fail to accurately replicate several aspects of empirical MSAs, including site-wise rates as well as amino acid and nucleotide composition. Data and Code Availability All simulated and empirical MSAs, as well as all analysis results, are available at https://cme.h-its.org/exelixis/material/simulation_study.tar.gz . All scripts required to reproduce our results are available at https://github.com/tschuelia/SimulationStudy and https://github.com/JohannaTrost/seqsharp . Contact [email protected]

    Datasets to: Measurement report: Introduction to the HyICE-2018 campaign for the measurements of ice nucleating particles in the boreal forest of Hyytiälä

    No full text
    <p>This repository contains the datasets used in the study 'Measurement report: Introduction to the HyICE-2018 campaign for the measurements of ice nucleating particles in the boreal forest of Hyytiälä'. Detailed information and technical aspects of the data can be found in the publication.</p> <p>Update (version 2 - January 2024): The repository was updated and now contains the complete HyICE-2018 datasets for the instruments uL-NIPI and PINE. </p> <p> </p&gt
    corecore