32 research outputs found

    Development of bioinformatics tools for the rapid and sensitive detection of known and unknown pathogens from next generation sequencing data

    Get PDF
    Infectious diseases still remain one of the main causes of death across the globe. Despite huge advances in clinical diagnostics, establishing a clear etiology remains impossible in a proportion of cases. Since the emergence of next generation sequencing (NGS), a multitude of new research fields based on this technology have evolved. Especially its application in metagenomics – denoting the research on genomic material taken directly from its environment – has led to a rapid development of new applications. Metagenomic NGS has proven to be a promising tool in the field of pathogen related research and diagnostics. In this thesis, I present different approaches for the detection of known and the discovery of unknown pathogens from NGS data. These contributions subdivide into three newly developed methods and one publication on a real-world use case of methodology we developed and data analysis based on it. First, I present LiveKraken, a real-time read classification tool based on the core algorithm of Kraken. LiveKraken uses streams of raw data from Illumina sequencers to classify reads taxonomically. This way, we are able to produce results identical to those of Kraken the moment the sequencer finishes. We are furthermore able to provide comparable results in early stages of a sequencing run, allowing saving up to a week of sequencing time. While the number of classified reads grows over time, false classifications appear in negligible numbers and proportions of identified taxa are only affected to a minor extent. In the second project, we designed and implemented PathoLive, a real-time diagnostics pipeline which allows the detection of pathogens from clinical samples before the sequencing procedure is finished. We adapted the core algorithm of HiLive, a real-time read mapper, and enhanced its accuracy for our use case. Furthermore, probably irrelevant sequences automatically marked. The results are visualized in an interactive taxonomic tree that provides an intuitive overview and detailed metrics regarding the relevance of each identified pathogen. Testing PathoLive on the sequencing of a real plasma sample spiked with viruses, we could prove that we ranked the results more accurately throughout the complete sequencing run than any other tested tool did at the end of the sequencing run. With PathoLive, we shift the focus of NGS-based diagnostics from read quantification towards a more meaningful assessment of results in unprecedented turnaround time. The third project aims at the detection of novel pathogens from NGS data. We developed RAMBO-K, a tool which allows rapid and sensitive removal of unwanted host sequences from NGS datasets. RAMBO-K is faster than any tool we tested, while showing a consistently high sensitivity and specificity across different datasets. RAMBO-K rapidly and reliably separates reads from different species. It is suitable as a straightforward standard solution for workflows dealing with mixed datasets. In the fourth project, we used RAMBO-K as well as several other data analyses to discover Berlin squirrelpox virus, a deviant new poxvirus establishing a new genus of poxviridae. Near Berlin, Germany, several juvenile red squirrels (Sciurus vulgaris) were found with moist, crusty skin lesions. Histology, electron microscopy, and cell culture isolation revealed an orthopoxvirus-like infection. After standard workflows yielded no significant results, poxviral reads were assigned using RAMBO-K, enabling the assembly of the genome of the novel virus. With these projects, we established three new application-related methods each of which closes different research gaps. Taken together, we enhance the available repertoire of NGS-based pathogen related research tools and alleviate and fasten a variety of research projects

    Outcome of Different Sequencing and Assembly Approaches on the Detection of Plasmids and Localization of Antimicrobial Resistance Genes in Commensal Escherichia coli

    Get PDF
    Antimicrobial resistance (AMR) is a major threat to public health worldwide. Currently, AMR typing changes from phenotypic testing to whole-genome sequence (WGS)-based detection of resistance determinants for a better understanding of the isolate diversity and elements involved in gene transmission (e.g., plasmids, bacteriophages, transposons). However, the use of WGS data in monitoring purposes requires suitable techniques, standardized parameters and approved guidelines for reliable AMR gene detection and prediction of their association with mobile genetic elements (plasmids). In this study, different sequencing and assembly strategies were tested for their suitability in AMR monitoring in Escherichia coli in the routines of the German National Reference Laboratory for Antimicrobial Resistances. To assess the outcomes of the different approaches, results from in silico predictions were compared with conventional phenotypic- and genotypic-typing data. With the focus on (fluoro)quinolone-resistant E.coli, five qnrS-positive isolates with multiple extrachromosomal elements were subjected to WGS with NextSeq (Illumina), PacBio (Pacific BioSciences) and ONT (Oxford Nanopore) for in depth characterization of the qnrS1-carrying plasmids. Raw reads from short- and long-read sequencing were assembled individually by Unicycler or Flye or a combination of both (hybrid assembly). The generated contigs were subjected to bioinformatics analysis. Based on the generated data, assembly of long-read sequences are error prone and can yield in a loss of small plasmid genomes. In contrast, short-read sequencing was shown to be insufficient for the prediction of a linkage of AMR genes (e.g., qnrS1) to specific plasmid sequences. Furthermore, short-read sequencing failed to detect certain duplications and was unsuitable for genome finishing. Overall, the hybrid assembly led to the most comprehensive typing results, especially in predicting associations of AMR genes and mobile genetic elements. Thus, the use of different sequencing technologies and hybrid assemblies currently represents the best approach for reliable AMR typing and risk assessment

    Predicting Protests by Disadvantaged Skilled Immigrants: A Test of an Integrated Social Identity, Relative Deprivation, Collective Efficacy (SIRDE) Model

    Get PDF
    In Canada, skilled immigrants with foreign credentials tend to experience difficulty in obtaining a suitable job in their chosen profession. This is because employers do not recognize the full value of such qualifications. We used structural equation modeling to test a social identity, relative deprivation, collective efficacy model in a prospective study of a sample of skilled immigrants (N = 234) disadvantaged by this “credentialing” problem. In this model, variables measured at time 1 successfully predicted participation in protest actions during the following 4 months, measured at time 2. First, we conceptualized the affective component of collective relative deprivation (CRD) as (i) the perception of discrimination by the majority group and (ii) the emotional reaction of anger, resentment and frustration in response to that discrimination. The results suggested that the latter positively influenced participation in protest actions but, unexpectedly, the former had the opposite effect. Second, the evidence suggested that respondents’ identification with Canada, but not their cultural group, indirectly influenced such participation through collective efficacy and the two components of affective CRD. Third, the novel hypothesis that status insecurity mediates the relationship between cognitive CRD and the two components of affective CRD was supported. Finally, the results suggest that collective efficacy was a strong and direct determinant of participation in protest actions. The implications of these results for the development of an integrated social psychological theory that can predict participation in political protests are discussed

    Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package

    Get PDF
    Background The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. Results As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. Conclusions Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI’s BioSample database

    Whole Genome Sequence Analysis of a Prototype Strain of the Novel Putative Rotavirus Species L

    No full text
    Rotaviruses infect humans and animals and are a main cause of diarrhea. They are non-enveloped viruses with a genome of 11 double-stranded RNA segments. Based on genome analysis and amino acid sequence identities of the capsid protein VP6, the rotavirus species A to J (RVA-RVJ) have been defined so far. In addition, rotaviruses putatively assigned to the novel rotavirus species K (RVK) and L (RVL) have been recently identified in common shrews (Sorex araneus), based on partial genome sequences. Here, the complete genome sequence of strain KS14/0241, a prototype strain of RVL, is presented. The deduced amino acid sequence for VP6 of this strain shows only up to 47% identity to that of RVA to RVJ reference strains. Phylogenetic analyses indicate a clustering separated from the established rotavirus species for all 11 genome segments of RVL, with the closest relationship to RVH and RVJ within the phylogenetic RVB-like clade. The non-coding genome segment termini of RVL showed conserved sequences at the 5′-end (positive-sense RNA strand), which are common to all rotaviruses, and those conserved among the RVB-like clade at the 3′-end. The results are consistent with a classification of the virus into a novel rotavirus species L

    Genome analysis of the novel putative rotavirus species K

    No full text
    Rotaviruses are causative agents of diarrhea in humans and animals. Currently, the species rotavirus A-J (RVA-RVJ) and the putative species RVK and RVL are defined, mainly based on their genome sequence identities. RVK strains were first identified in 2019 in common shrews (Sorex aranaeus) in Germany; however, only short sequence fragments were available so far. Here, we analyzed the complete coding regions of strain RVK/shrew-wt/GER/KS14–0241/2013, which showed highest sequence identities with RVC. The amino acid sequence identity of VP6, which is used for rotavirus species definition, reached only 51% with other rotavirus reference strains thus confirming classification of RVK as a separate species. Phylogenetic analyses for the deduced amino acid sequences of all 11 virus proteins showed, that for most of them RVK and RVC formed a common branch within the RVA-like phylogenetic clade. Only the tree for the highly variable NSP4 showed a different branching; however, with very low bootstrap support. Comparison of partial nucleotide sequences of other RVK strains from common shrews of different regions in Germany indicated a high degree of sequence variability (61–97% identity) within the putative species. These RVK strains clustered separately from RVC genotype reference strains in phylogenetic trees indicating diversification of RVK independent from RVC. The results indicate that RVK represents a novel rotavirus species, which is most closely related to RVC

    PAIPline: pathogen identification in metagenomic and clinical next generation sequencing samples

    Get PDF
    Motivation: Next generation sequencing (NGS) has provided researchers with a powerful tool to characterize metagenomic and clinical samples in research and diagnostic settings. NGS allows an open view into samples useful for pathogen detection in an unbiased fashion and without prior hypothesis about possible causative agents. However, NGS datasets for pathogen detection come with different obstacles, such as a very unfavorable ratio of pathogen to host reads. Alongside often appearing false positives and irrelevant organisms, such as contaminants, tools are often challenged by samples with low pathogen loads and might not report organisms present below a certain threshold. Furthermore, some metagenomic profiling tools are only focused on one particular set of pathogens, for example bacteria. Results: We present PAIPline, a bioinformatics pipeline specifically designed to address problems associated with detecting pathogens in diagnostic samples. PAIPline particularly focuses on userfriendliness and encapsulates all necessary steps from preprocessing to resolution of ambiguous reads and filtering up to visualization in a single tool. In contrast to existing tools, PAIPline is more specific while maintaining sensitivity. This is shown in a comparative evaluation where PAIPline was benchmarked along other well-known metagenomic profiling tools on previously published well-characterized datasets. Additionally, as part of an international cooperation project, PAIPline was applied to an outbreak sample of hemorrhagic fevers of then unknown etiology. The presented results show that PAIPline can serve as a robust, reliable, user-friendly, adaptable and generalizable stand-alone software for diagnostics from NGS samples and as a stepping stone for further downstream analyses. Availability and implementation: PAIPline is freely available under https://gitlab.com/rki_bioinformatics/paipline.Peer Reviewe

    First Detection of GES-5-Producing Escherichia coli from Livestock—An Increasing Diversity of Carbapenemases Recognized from German Pig Production

    No full text
    Resistance to carbapenems due to carbapenemase-producing Enterobacteriaceae (CPE) is an increasing threat to human health worldwide. In recent years, CPE could be found only sporadically from livestock, but concern rose that livestock might become a reservoir for CPE. In 2019, the first GES carbapenemase-producing Escherichia coli from livestock was detected within the German national monitoring on antimicrobial resistance. The isolate was obtained from pig feces and was phenotypically resistant to meropenem and ertapenem. The isolate harbored three successive blaGES genes encoding for GES-1, GES-5 and GES-5B in an incomplete class-I integron on a 12 kb plasmid (pEC19-AB02908; Acc. No. MT955355). The strain further encoded for virulence-associated genes typical for uropathogenic E. coli, which might hint at an increased pathogenic potential. The isolate produced the third carbapenemase detected from German livestock. The finding underlines the importance CPE monitoring and detailed characterization of new isolates

    Berlin Squirrelpox Virus, a New Poxvirus in Red Squirrels, Berlin, Germany

    Get PDF
    Near Berlin, Germany, several juvenile red squirrels (Sciurus vulgaris) were found with moist, crusty skin lesions. Histology, electron microscopy, and cell culture isolation revealed an orthopoxvirus-like infection. Subsequent PCR and genome analysis identified a new poxvirus (Berlin squirrelpox virus) that could not be assigned to any known poxvirus genera

    RAMBO-K: Rapid and Sensitive Removal of Background Sequences from Next Generation Sequencing Data

    Get PDF
    <div><p>Background</p><p>The assembly of viral or endosymbiont genomes from Next Generation Sequencing (NGS) data is often hampered by the predominant abundance of reads originating from the host organism. These reads increase the memory and CPU time usage of the assembler and can lead to misassemblies.</p><p>Results</p><p>We developed RAMBO-K (Read Assignment Method Based On K-mers), a tool which allows rapid and sensitive removal of unwanted host sequences from NGS datasets. Reaching a speed of 10 Megabases/s on 4 CPU cores and a standard hard drive, RAMBO-K is faster than any tool we tested, while showing a consistently high sensitivity and specificity across different datasets.</p><p>Conclusions</p><p>RAMBO-K rapidly and reliably separates reads from different species without data preprocessing. It is suitable as a straightforward standard solution for workflows dealing with mixed datasets. Binaries and source code (java and python) are available from <a href="http://sourceforge.net/projects/rambok/" target="_blank">http://sourceforge.net/projects/rambok/</a>.</p></div
    corecore