18 research outputs found

    Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences

    Get PDF
    Author summary In the biosciences, predictive methods are becoming increasingly necessary as novel sequences are generated at an ever-increasing rate. The volume of sequence data necessitates Automated Function Prediction (AFP) as manual curation is often impossible. Unfortunately, selecting the best AFP method is complicated by researchers using different evaluation metrics. Furthermore, many commonly-used metrics can give misleading results. We argue that the use of poor metrics in AFP evaluation is a result of the lack of methods to benchmark the metrics themselves. We propose an approach called Artificial Dilution Series (ADS). ADS uses existing data sets to generate multiple artificial AFP results, where each result has a controlled error rate. We use ADS to understand whether different metrics can distinguish between results with known quantities of error. Our results highlight dramatic differences in performance between evaluation metrics. Automated protein annotation using the Gene Ontology (GO) plays an important role in the biosciences. Evaluation has always been considered central to developing novel annotation methods, but little attention has been paid to the evaluation metrics themselves. Evaluation metrics define how well an annotation method performs and allows for them to be ranked against one another. Unfortunately, most of these metrics were adopted from the machine learning literature without establishing whether they were appropriate for GO annotations. We propose a novel approach for comparing GO evaluation metrics called Artificial Dilution Series (ADS). Our approach uses existing annotation data to generate a series of annotation sets with different levels of correctness (referred to as their signal level). We calculate the evaluation metric being tested for each annotation set in the series, allowing us to identify whether it can separate different signal levels. Finally, we contrast these results with several false positive annotation sets, which are designed to expose systematic weaknesses in GO assessment. We compared 37 evaluation metrics for GO annotation using ADS and identified drastic differences between metrics. We show that some metrics struggle to differentiate between different signal levels, while others give erroneously high scores to the false positive data sets. Based on our findings, we provide guidelines on which evaluation metrics perform well with the Gene Ontology and propose improvements to several well-known evaluation metrics. In general, we argue that evaluation metrics should be tested for their performance and we provide software for this purpose (). ADS is applicable to other areas of science where the evaluation of prediction results is non-trivial.Peer reviewe

    Enhanced Viral Metagenomics with Lazypipe 2

    Get PDF
    Viruses are the main agents causing emerging and re-emerging infectious diseases. It is therefore important to screen for and detect them and uncover the evolutionary processes that support their ability to jump species boundaries and establish themselves in new hosts. Metagenomic next-generation sequencing (mNGS) is a high-throughput, impartial technology that has enabled virologists to detect either known or novel, divergent viruses from clinical, animal, wildlife and environmental samples, with little a priori assumptions. mNGS is heavily dependent on bioinformatic analysis, with an emerging demand for integrated bioinformatic workflows. Here, we present Lazypipe2, an updated mNGS pipeline with, as compared to Lazypipe1, significant improvements in code stability and transparency, with added functionality and support for new software components. We also present extensive benchmarking results, including evaluation of a novel canine simulated metagenome, precision and recall of virus detection at varying sequencing depth, and a low to extremely low proportion of viral genetic material. Additionally, we report accuracy of virus detection with two strategies: homology searches using nucleotide or amino acid sequences. We show that Lazypipe2 with nucleotide-based annotation approaches near perfect detection for eukaryotic viruses and, in terms of accuracy, outperforms the compared pipelines. We also discuss the importance of homology searches with amino acid sequences for the detection of highly divergent novel viruses

    Bombali Virus in Mops condylurus Bat, Kenya

    Get PDF
    Bombali virus (genus Ebolavirus) was identified in organs and excreta of an Angolan free-tailed bat (Mops condylurus) in Kenya. Complete genome analysis revealed 98% nucleotide sequence similarity to the prototype virus from Sierra Leone. No Ebola virus-specific RNA or antibodies were detected from febrile humans in the area who reported contact with bats.Peer reviewe

    Detection of novel tick-borne pathogen, Alongshan virus, in Ixodes ricinus ticks, south-eastern Finland, 2019

    Get PDF
    The newly identified tick-borne Alongshan virus (ALSV), a segmented Jingmen virus group flavivirus, was recently associated with human disease in China. We report the detection of ALSV RNA in Ixodes ricinus ticks in south-eastern Finland. Screening of sera from patients suspected for tick-borne encephalitis for Jingmen tick virus-like virus RNA and antibodies revealed no human cases. The presence of ALSV in common European ticks warrants further investigations on its role as a human pathogen.Peer reviewe

    Enhanced Viral Metagenomics with Lazypipe 2

    No full text
    Viruses are the main agents causing emerging and re-emerging infectious diseases. It is therefore important to screen for and detect them and uncover the evolutionary processes that support their ability to jump species boundaries and establish themselves in new hosts. Metagenomic next-generation sequencing (mNGS) is a high-throughput, impartial technology that has enabled virologists to detect either known or novel, divergent viruses from clinical, animal, wildlife and environmental samples, with little a priori assumptions. mNGS is heavily dependent on bioinformatic analysis, with an emerging demand for integrated bioinformatic workflows. Here, we present Lazypipe2, an updated mNGS pipeline with, as compared to Lazypipe1, significant improvements in code stability and transparency, with added functionality and support for new software components. We also present extensive benchmarking results, including evaluation of a novel canine simulated metagenome, precision and recall of virus detection at varying sequencing depth, and a low to extremely low proportion of viral genetic material. Additionally, we report accuracy of virus detection with two strategies: homology searches using nucleotide or amino acid sequences. We show that Lazypipe2 with nucleotide-based annotation approaches near perfect detection for eukaryotic viruses and, in terms of accuracy, outperforms the compared pipelines. We also discuss the importance of homology searches with amino acid sequences for the detection of highly divergent novel viruses

    On the reconstruction of residual stresses and strains of a plate after shot peening

    No full text
    The subject of this research is a mathematical description of the shape and the stress-strain state of a steel plate subjected to unilateral shot peening, its experimental verification and application of the results for verification of methods for reconstruction of residual stress and strain fields according to experimental data. Such plate is used in manufacturing as a calibrating sample to determine of shot peening duration required for formation of proper compressive tangential stress in the surface layer of the processed product. The method of calibration is convenient and widely applied in different technologies of surface hardening. In that case the source of the residual stresses is plastic strains in surface layer produced by shot peening. For the statement of the problem a plastic strain tensor field is defined up to an arbitrary function. The shape and the stress-strain state of an elastic plate with the surface layer of plastic strains were calculated numerically. The qualitative behavior of numerical solution allowed us to accept the set of hypotheses to find an analytical solution of the spatial problem of elasticity theory and to weaken the boundary conditions. The exact solution has been found analytically. Within the framework of the plane stress state along the thickness and transverse directions, the result exactly corresponds to the Davidenkov–Birger formula connected the tangential residual stress distribution on depth with the function of deflections. An explicit formula for the dependence of the residual (plastic) deformation on the thickness coordinate is obtained. Sources of errors of the received expressions and methods of their correction are analyzed. An experiment has been carried out on the one-sided shot peening of calibration plate made of hardened 65G steel, for which the layer-by-layer etching of the treated surface and the measurement of the flexure of the plate were made (by Davidenkov method). The profiles of residual stresses and strains were reconstructed numerically with reasonable accuracy using the obtained experimental data. The result is applicable to a wide class of problems for elastic bodies with hardened surface layers. It may serve as a base for experimental research of such problems, help to formulate hypotheses and test them by experiment, help to study relation between physical fields in asymptotic case, help to verify applicability of different ways to account residual stresses in numerical solution. The solution found can be used for verification of stress and displacement fields in different cases of preliminarily stressed shell elements in engineering software for calculation of fatigue endurance of different machine parts with hardened surface layer. It also seems to be a reference for the study of surface-hardened bodies with curved free boundary, to which most of the practically important tasks are reduced

    Partial Genome Characterization of Novel Parapoxvirus in Horse, Finland

    No full text
    We report a sequencing protocol and 121-kb poxvirus sequence from a clinical sample from a horse in Finland with dermatitis. Based on phylogenetic analyses, the virus is a novel parapoxvirus associated with a recent epidemic; previous data suggest zoonotic potential. Increased awareness of this virus and specific diagnostic protocols are needed

    Photoinduced Processes in Lysine-Tryptophan-Lysine Tripeptide with L and D Tryptophan

    No full text
    Optical isomers of short peptide Lysine-Tryptophan-Lysine (Lys-{L/D-Trp}-Lys) and Lys-Trp-Lys with an acetate counter-ion were used to study photoinduced intramolecular and intermolecular processes of interest in photobiology. A comparison of L- and D-amino acid reactivity is also the focus of scientists’ attention in various specialties because today, the presence of amyloid proteins with D-amino acids in the human brain is considered one of the leading causes of Alzheimer’s disease. Since aggregated amyloids, mainly Aβ42, are highly disordered peptides that cannot be studied with traditional NMR and X-ray techniques, it is trending to explore the reasons for differences between L- and D-amino acids using short peptides, as in our article. Using NMR, chemically induced dynamic nuclear polarization (CIDNP) and fluorescence techniques allowed us to detect the influence of tryptophan (Trp) optical configuration on the peptides fluorescence quantum yields, bimolecular quenching rates of Trp excited state, and the photocleavage products formation. Thus, compared with the D-analog, the L-isomer shows a greater Trp excited state quenching efficiency with the electron transfer (ET) mechanism. There are experimental confirmations of the hypothesis about photoinduced ET between Trp and the CONH peptide bond, as well as between Trp and another amide group
    corecore