289 research outputs found

    Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data

    Get PDF
    BackgroundThe prediction of bacteriophage sequences in metagenomic datasets has become a topic of considerable interest, leading to the development of many novel bioinformatic tools. A comparative analysis of ten state-of-the-art phage identification tools was performed to inform their usage in microbiome research.MethodsArtificial contigs generated from complete RefSeq genomes representing phages, plasmids, and chromosomes, and a previously sequenced mock community containing four phage species, were used to evaluate the precision, recall, and F1 scores of the tools. We also generated a dataset of randomly shuffled sequences to quantify false-positive calls. In addition, a set of previously simulated viromes was used to assess diversity bias in each tool’s output.ResultsVIBRANT and VirSorter2 achieved the highest F1 scores (0.93) in the RefSeq artificial contigs dataset, with several other tools also performing well. Kraken2 had the highest F1 score (0.86) in the mock community benchmark by a large margin (0.3 higher than DeepVirFinder in second place), mainly due to its high precision (0.96). Generally, k-mer-based tools performed better than reference similarity tools and gene-based methods. Several tools, most notably PPR-Meta, called a high number of false positives in the randomly shuffled sequences. When analysing the diversity of the genomes that each tool predicted from a virome set, most tools produced a viral genome set that had similar alpha- and beta-diversity patterns to the original population, with Seeker being a notable exception.ConclusionsThis study provides key metrics used to assess performance of phage detection tools, offers a framework for further comparison of additional viral discovery tools, and discusses optimal strategies for using these tools. We highlight that the choice of tool for identification of phages in metagenomic datasets, as well as their parameters, can bias the results and provide pointers for different use case scenarios. We have also made our benchmarking dataset available for download in order to facilitate future comparisons of phage identification tools

    Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions

    Get PDF
    Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially. IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.Peer reviewe

    Progress and Prospects for a Nucleic Acid Screening Test Set

    Get PDF
    Objective: DNA synthesis companies screen orders to detect controlled sequences with misuse risks. Assessing screening accuracy is challenging owing to the breadth of biological risks and ambiguities in risk definitions. Here, we detail an International Gene Synthesis Consortium working group’s rationale and process to develop a prototype DNA synthesis screening test dataset, aiming to establish a baseline of screening system accuracy to compare with various screening approaches.Methodology: Construction of the prototype test dataset involved four tool developers screening nucleic acid sequences from three taxonomic clusters of controlled organisms (Orbivirus, Francisella tularensis, and Coccidioides). Results were mapped onto predefined, comparable categories, checking for consensus or conflicts. Conflicts were grouped based on gene annotation and resolved through discussion.Results: The process highlighted several long-standing challenges in DNA synthesis screening, including the qualitative differences in approaches taken by screening tools. Our findings highlight the lack of clarity in assessing pathogen sequences with respect to regulatory control language, compounded by scientific uncertainty. We illustrate the current degree of consensus and existing challenges using classification statistics and specific examples.Conclusions and Next Steps: This prototype underscores the necessity of expert-regulator coordination in assessing gene-associated risks, offering a template for creating test sets across all taxonomic groups on international control lists. Expanding the working group would enrich dataset comprehensiveness, enabling a transition from species-focused to function-focused regulatory controls. This sets the foundation for quality control, certification, and improved risk assessment in DNA synthesis screening

    Evolution of Salmonella enterica serotype Typhimurium driven by anthropogenic selection and niche adaptation

    Get PDF
    Salmonella enterica serotype Typhimurium (S. Typhimurium) is a leading cause of gastroenteritis and bacteraemia worldwide, and a model organism for the study of host-pathogen interactions. Two S. Typhimurium strains (SL1344 and ATCC14028) are widely used to study host-pathogen interactions, yet genotypic variation results in strains with diverse host range, pathogenicity and risk to food safety. The population structure of diverse strains of S. Typhimurium revealed a major phylogroup of predominantly sequence type 19 (ST19) and a minor phylogroup of ST36. The major phylogroup had a population structure with two high order clades (α and β) and multiple subclades on extended internal branches, that exhibited distinct signatures of host adaptation and anthropogenic selection. Clade α contained a number of subclades composed of strains from well characterized epidemics in domesticated animals, while clade β contained multiple subclades associated with wild avian species. The contrasting epidemiology of strains in clade α and β was reflected by the distinct distribution of antimicrobial resistance (AMR) genes, accumulation of hypothetically disrupted coding sequences (HDCS), and signatures of functional diversification. These observations were consistent with elevated anthropogenic selection of clade α lineages from adaptation to circulation in populations of domesticated livestock, and the predisposition of clade β lineages to undergo adaptation to an invasive lifestyle by a process of convergent evolution with of host adapted Salmonella serotypes. Gene flux was predominantly driven by acquisition and recombination of prophage and associated cargo genes, with only occasional loss of these elements. The acquisition of large chromosomally-encoded genetic islands was limited, but notably, a feature of two recent pandemic clones (DT104 and monophasic S. Typhimurium ST34) of clade α (SGI-1 and SGI-4)

    A database of microRNA expression patterns in Xenopus laevis

    Get PDF
    MicroRNAs (miRNAs) are short, non-coding RNAs around 22 nucleotides long. They inhibit gene expression either by translational repression or by causing the degradation of the mRNAs they bind to. Many are highly conserved amongst diverse organisms and have restricted spatio-temporal expression patterns during embryonic development where they are thought to be involved in generating accuracy of developmental timing and in supporting cell fate decisions and tissue identity. We determined the expression patterns of 180 miRNAs in Xenopus laevis embryos using LNA oligonucleotides. In addition we carried out small RNA-seq on different stages of early Xenopus development, identified 44 miRNAs belonging to 29 new families and characterized the expression of 5 of these. Our analyses identified miRNA expression in many organs of the developing embryo. In particular a large number were expressed in neural tissue and in the somites. Surprisingly none of the miRNAs we have looked at show expression in the heart. Our results have been made freely available as a resource in both XenMARK and Xenbase

    An African Salmonella Typhimurium ST313 sublineage with extensive drug-resistance and signatures of host adaptation

    Get PDF
    Abstract: Bloodstream infections by Salmonella enterica serovar Typhimurium constitute a major health burden in sub-Saharan Africa (SSA). These invasive non-typhoidal (iNTS) infections are dominated by isolates of the antibiotic resistance-associated sequence type (ST) 313. Here, we report emergence of ST313 sublineage II.1 in the Democratic Republic of the Congo. Sublineage II.1 exhibits extensive drug resistance, involving a combination of multidrug resistance, extended spectrum β-lactamase production and azithromycin resistance. ST313 lineage II.1 isolates harbour an IncHI2 plasmid we name pSTm-ST313-II.1, with one isolate also exhibiting decreased ciprofloxacin susceptibility. Whole genome sequencing reveals that ST313 II.1 isolates have accumulated genetic signatures potentially associated with altered pathogenicity and host adaptation, related to changes observed in biofilm formation and metabolic capacity. Sublineage II.1 emerged at the beginning of the 21st century and is involved in on-going outbreaks. Our data provide evidence of further evolution within the ST313 clade associated with iNTS in SSA

    Enhancing easy-plane anisotropy in bespoke Ni(II) quantum magnets

    Get PDF
    We examine the crystal structures and magnetic properties of several S = 1 Ni(II) coordination compounds, molecules and polymers, that include the bridging ligands HF2-, AF62- (A = Ti, Zr) and pyrazine or non-bridging ligands F-, SiF62-, glycine, H2O, 1-vinylimidazole, 4-methylpyrazole and 3-hydroxypyridine. Pseudo-octahedral NiN4F2, NiN4O2 or NiN4OF cores consist of equatorial Ni-N bonds that are equal to or slightly longer than the axial Ni-Lax bonds. By design, the zero-field splitting (D) is large in these systems and, in the presence of substantial exchange interactions (J), can be difficult to discriminate from magnetometry measurements on powder samples. Thus, we relied on pulsed-field magnetization in those cases and employed electron-spin resonance (ESR) to confirm D when J 0) and range from ≈ 8-25 K. This work reveals a linear correlation between the ratio d(Ni-Lax)/d(Ni-Neq) and D although the ligand spectrochemical properties may also be important. We assert that this relationship allows us to predict the type of magnetocrystalline anisotropy in tailored Ni(II) quantum magnets

    Establishing a core outcome set for peritoneal dialysis : report of the SONG-PD (standardized outcomes in nephrology-peritoneal dialysis) consensus workshop

    Get PDF
    Outcomes reported in randomized controlled trials in peritoneal dialysis (PD) are diverse, are measured inconsistently, and may not be important to patients, families, and clinicians. The Standardized Outcomes in Nephrology-Peritoneal Dialysis (SONG-PD) initiative aims to establish a core outcome set for trials in PD based on the shared priorities of all stakeholders. We convened an international SONG-PD stakeholder consensus workshop in May 2018 in Vancouver, Canada. Nineteen patients/caregivers and 51 health professionals attended. Participants discussed core outcome domains and implementation in trials in PD. Four themes relating to the formation of core outcome domains were identified: life participation as a main goal of PD, impact of fatigue, empowerment for preparation and planning, and separation of contributing factors from core factors. Considerations for implementation were identified: standardizing patient-reported outcomes, requiring a validated and feasible measure, simplicity of binary outcomes, responsiveness to interventions, and using positive terminology. All stakeholders supported inclusion of PD-related infection, cardiovascular disease, mortality, technique survival, and life participation as the core outcome domains for PD
    • …
    corecore