38 research outputs found

    GalaxyTrakr: a distributed analysis tool for public health whole genome sequence data accessible to non-bioinformaticians

    Get PDF
    Background: Processing and analyzing whole genome sequencing (WGS) is computationally intense: a single Illumina MiSeq WGS run produces ~ 1 million 250-base-pair reads for each of 24 samples. This poses significant obstacles for smaller laboratories, or laboratories not affiliated with larger projects, which may not have dedicated bioinformatics staff or computing power to effectively use genomic data to protect public health. Building on the success of the cloud-based Galaxy bioinformatics platform (http://galaxyproject.org), already known for its userfriendliness and powerful WGS analytical tools, the Center for Food Safety and Applied Nutrition (CFSAN) at the U.S. Food and Drug Administration (FDA) created a customized ‘instance’ of the Galaxy environment, called GalaxyTrakr (https://www.galaxytrakr.org), for use by laboratory scientists performing food-safety regulatory research. The goal was to enable laboratories outside of the FDA internal network to (1) perform quality assessments of sequence data, (2) identify links between clinical isolates and positive food/environmental samples, including those at the National Center for Biotechnology Information sequence read archive (https://www.ncbi.nlm.nih.gov/sra/), and (3) explore new methodologies such as metagenomics. GalaxyTrakr hosts a variety of free and adaptable tools and provides the data storage and computing power to run the tools. These tools support coordinated analytic methods and consistent interpretation of results across laboratories. Users can create and share tools for their specific needs and use sequence data generated locally and elsewhere. Results: In its first full year (2018), GalaxyTrakr processed over 85,000 jobs and went from 25 to 250 users, representing 53 different public and state health laboratories, academic institutions, international health laboratories, and federal organizations. By mid-2020, it has grown to 600 registered users and processed over 450,000 analytical jobs. To illustrate how laboratories are making use of this resource, we describe how six institutions use GalaxyTrakr to quickly analyze and review their data. Instructions for participating in GalaxyTrakr are provided. Conclusions: GalaxyTrakr advances food safety by providing reliable and harmonized WGS analyses for public health laboratories and promoting collaboration across laboratories with differing resources. Anticipated enhancements to this resource will include workflows for additional foodborne pathogens, viruses, and parasites, as well as new tools and services.Center for Food Safety and Applied Nutrition at the U.S. Food and Drug AdministrationVersión publicada - versión final del edito

    Development of an amplicon-based sequencing approach in response to the global emergence of mpox

    Get PDF
    The 2022 multicountry mpox outbreak concurrent with the ongoing Coronavirus Disease 2019 (COVID-19) pandemic further highlighted the need for genomic surveillance and rapid pathogen whole-genome sequencing. While metagenomic sequencing approaches have been used to sequence many of the early mpox infections, these methods are resource intensive and require samples with high viral DNA concentrations. Given the atypical clinical presentation of cases associated with the outbreak and uncertainty regarding viral load across both the course of infection and anatomical body sites, there was an urgent need for a more sensitive and broadly applicable sequencing approach. Highly multiplexed amplicon-based sequencing (PrimalSeq) was initially developed for sequencing of Zika virus, and later adapted as the main sequencing approach for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Here, we used PrimalScheme to develop a primer scheme for human monkeypox virus that can be used with many sequencing and bioinformatics pipelines implemented in public health laboratories during the COVID-19 pandemic. We sequenced clinical specimens that tested presumptively positive for human monkeypox virus with amplicon-based and metagenomic sequencing approaches. We found notably higher genome coverage across the virus genome, with minimal amplicon drop-outs, in using the amplicon-based sequencing approach, particularly in higher PCR cycle threshold (Ct) (lower DNA titer) samples. Further testing demonstrated that Ct value correlated with the number of sequencing reads and influenced the percent genome coverage. To maximize genome coverage when resources are limited, we recommend selecting samples with a PCR Ct below 31 Ct and generating 1 million sequencing reads per sample. To support national and international public health genomic surveillance efforts, we sent out primer pool aliquots to 10 laboratories across the United States, United Kingdom, Brazil, and Portugal. These public health laboratories successfully implemented the human monkeypox virus primer scheme in various amplicon sequencing workflows and with different sample types across a range of Ct values. Thus, we show that amplicon-based sequencing can provide a rapidly deployable, cost-effective, and flexible approach to pathogen whole-genome sequencing in response to newly emerging pathogens. Importantly, through the implementation of our primer scheme into existing SARS-CoV-2 workflows and across a range of sample types and sequencing platforms, we further demonstrate the potential of this approach for rapid outbreak response.This publication was made possible by CTSA Grant Number UL1 TR001863 from the National Center for Advancing Translational Science (NCATS), a component of the National Institutes of Health (NIH) awarded to CBFV. INSA was partially funded by the HERA project (Grant/ 2021/PHF/23776) supported by the European Commission through the European Centre for Disease Control (to VB).info:eu-repo/semantics/publishedVersio

    Running the Titan_ONT Workflow on Terra.bio v1

    No full text
    The Titan_ONT workflow is a part of the Public Health Viral Genomics Titan series for SARS-CoV-2 genomic characterization. Titan_ONT was written specifically to process basecalled and demultiplexed Oxford Nanopore Technology (ONT) read data. Input reads are assumed to be the product of sequencing ARTIC V3 tiled PCR-amplicons designed for the SARS-CoV-2 genome. Upon initiating a Titan_ONT run, input read data provided for each sample will be processed to perform consensus genome assembly, infer the quality of both raw read data and the generated consensus genome, and assign lineage or clade designations as outlined in the Titan_ONT data workflow diagram below. Additional technical documentation for the Titan_ONT workflow is available at: https://public-health-viral-genomics-theiagen.readthedocs.io/en/latest/titan_workflows.html#titan-workflows-for-genomic-characterization Required input data for Titan_ONT: Basecalled and demultiplexed ONT read data files (single FASTQ file per sample) Primer sequence coordinates of the PCR scheme utilized in BED file format Titan_ONT has not been written to process FAST5 files Video Instruction: Theiagen Genomics: Titan Genomic Characterization https://www.youtube.com/watch?v=zP9I1r6TNrw Theiagen Genomics: Titan Outputs QC https://www.youtube.com/watch?v=Amb-8M71umw For technical assistance please contact us at: [email protected] </p

    Running the Titan_ClearLabs Workflow on Terra.bio v1

    No full text
    The Titan_ClearLabs workflow is a part of the Public Health Viral Genomics Titan series for SARS-CoV-2 genomic characterization. Titan_CleanLabs was written to process Clear Labs read data for SARS-CoV-2 ARTIC V3 amplicon sequencing. Upon initiating a Titan_ClearLabs run, input read data provided for each sample will be processed to perform consensus genome assembly, infer the quality of both raw read data and the generated consensus genome, and assign lineage or clade designations as outlined in the Titan_ClearLabs data workflow below. Additional technical documentation for the Titan_ClearLabs workflow is available at: https://public-health-viral-genomics-theiagen.readthedocs.io/en/latest/titan_workflows.html#titan-clearlabs Required input data for Titan_ClearLabs: Cear Labs FASTQ read files (single FASTQ file per sample) Primer sequence coordinates of the PCR scheme utilized in BED file format Video Instruction: Theiagen Genomics: Titan Genomic Characterization https://www.youtube.com/watch?v=zP9I1r6TNrw Theiagen Genomics: Titan Outputs QC https://www.youtube.com/watch?v=Amb-8M71umw For technical assistance, please contact us at: [email protected] </p

    Running the Titan_Illumina_PE Workflow on Terra.bio v1

    No full text
    The Titan_Illumina_PE workflow is a part of the Public Health Viral Genomics Titan series for SARS-CoV-2 genomic characterization. Titan_Illumina_PE was written specifically to process Illumina paired-end (PE) read data. Input reads are assumed to be the product of sequencing tiled PCR-amplicons designed for the SARS-CoV-2 genome. The most common read data analyzed by the Titan_Illumina_PE workflow are generated with the ARTIC V3 protocol. However, alternative primer schemes such as the QIAseq Primer Panel are also suitable for this workflow. The primer sequence coordinates of the PCR scheme utilized must be provided in BED format along with the raw Illumina read data. Upon initiating a Titan_Illumina_PE job, the input primer scheme coordinates and raw paired-end Illumina read data provided for each sample will be processed to perform consensus genome assembly, infer the quality of both raw read data and the generated consensus genome, and assign lineage or clade designations as outlined in the Titan_Illumina_PE data workflow diagram below. Additional technical documentation for the Titan_Illumina_PE workflow is available at: https://public-health-viral-genomics-theiagen.readthedocs.io/en/latest/titan_workflows.html#titan-workflows-for-genomic-characterization Required input data for Titan Illumina PE: Illumina paired-end read data (forward and reverse FASTQ files per sample) Primer sequence coordinates of the PCR scheme utilized in BED file format Video Instruction: Theiagen Genomics: Titan Genomic Characterization https://www.youtube.com/watch?v=zP9I1r6TNrw Theiagen Genomics: Titan Outputs QC https://www.youtube.com/watch?v=Amb-8M71umw For technical assistance please contact us at: [email protected] </p

    GalaxyTrakr: a distributed analysis tool for public health whole genome sequence data accessible to non-bioinformaticians

    No full text
    Abstract Background Processing and analyzing whole genome sequencing (WGS) is computationally intense: a single Illumina MiSeq WGS run produces ~ 1 million 250-base-pair reads for each of 24 samples. This poses significant obstacles for smaller laboratories, or laboratories not affiliated with larger projects, which may not have dedicated bioinformatics staff or computing power to effectively use genomic data to protect public health. Building on the success of the cloud-based Galaxy bioinformatics platform (http://galaxyproject.org), already known for its user-friendliness and powerful WGS analytical tools, the Center for Food Safety and Applied Nutrition (CFSAN) at the U.S. Food and Drug Administration (FDA) created a customized ‘instance’ of the Galaxy environment, called GalaxyTrakr (https://www.galaxytrakr.org), for use by laboratory scientists performing food-safety regulatory research. The goal was to enable laboratories outside of the FDA internal network to (1) perform quality assessments of sequence data, (2) identify links between clinical isolates and positive food/environmental samples, including those at the National Center for Biotechnology Information sequence read archive (https://www.ncbi.nlm.nih.gov/sra/), and (3) explore new methodologies such as metagenomics. GalaxyTrakr hosts a variety of free and adaptable tools and provides the data storage and computing power to run the tools. These tools support coordinated analytic methods and consistent interpretation of results across laboratories. Users can create and share tools for their specific needs and use sequence data generated locally and elsewhere. Results In its first full year (2018), GalaxyTrakr processed over 85,000 jobs and went from 25 to 250 users, representing 53 different public and state health laboratories, academic institutions, international health laboratories, and federal organizations. By mid-2020, it has grown to 600 registered users and processed over 450,000 analytical jobs. To illustrate how laboratories are making use of this resource, we describe how six institutions use GalaxyTrakr to quickly analyze and review their data. Instructions for participating in GalaxyTrakr are provided. Conclusions GalaxyTrakr advances food safety by providing reliable and harmonized WGS analyses for public health laboratories and promoting collaboration across laboratories with differing resources. Anticipated enhancements to this resource will include workflows for additional foodborne pathogens, viruses, and parasites, as well as new tools and services. </jats:sec

    Image_2_The use of whole-genome sequencing and development of bioinformatics to monitor overlapping outbreaks of Candida auris in southern Nevada.TIF

    No full text
    A Candida auris outbreak has been ongoing in Southern Nevada since August 2021. In this manuscript we describe the sequencing of over 200 C. auris isolates from patients at several facilities. Genetically distinct subgroups of C. auris were detected from Clade I (3 distinct lineages) and III (1 lineage). Open-source bioinformatic tools were developed and implemented to aid in the epidemiological investigation. The work herein compares three methods for C. auris whole genome analysis: Nullarbor, MycoSNP and a new pipeline TheiaEuk. We also describe a novel analysis method focused on elucidating phylogenetic linkages between isolates within an ongoing outbreak. Moreover, this study places the ongoing outbreaks in a global context utilizing existing sequences provided worldwide. Lastly, we describe how the generated results were communicated to the epidemiologists and infection control to generate public health interventions.</p

    Table_2_The use of whole-genome sequencing and development of bioinformatics to monitor overlapping outbreaks of Candida auris in southern Nevada.xlsx

    No full text
    A Candida auris outbreak has been ongoing in Southern Nevada since August 2021. In this manuscript we describe the sequencing of over 200 C. auris isolates from patients at several facilities. Genetically distinct subgroups of C. auris were detected from Clade I (3 distinct lineages) and III (1 lineage). Open-source bioinformatic tools were developed and implemented to aid in the epidemiological investigation. The work herein compares three methods for C. auris whole genome analysis: Nullarbor, MycoSNP and a new pipeline TheiaEuk. We also describe a novel analysis method focused on elucidating phylogenetic linkages between isolates within an ongoing outbreak. Moreover, this study places the ongoing outbreaks in a global context utilizing existing sequences provided worldwide. Lastly, we describe how the generated results were communicated to the epidemiologists and infection control to generate public health interventions.</p
    corecore