9 research outputs found

    Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications [version 1; referees: 2 approved]

    Get PDF
    The PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species and genera.  Although the PubMLST website was conceived as part of the development of the first multi-locus sequence typing (MLST) scheme in 1998 the software it uses, the Bacterial Isolate Genome Sequence database (BIGSdb, published in 2010), enables PubMLST to include all levels of sequence data, from single gene sequences up to and including complete, finished genomes.  Here we describe developments in the BIGSdb software made from publication to June 2018 and show how the platform realises microbial population genomics for a wide range of applications.  The system is based on the gene-by-gene analysis of microbial genomes, with each deposited sequence annotated and curated to identify the genes present and systematically catalogue their variation.  Originally intended as a means of characterising isolates with typing schemes, the synthesis of sequences and records of genetic variation with provenance and phenotype data permits highly scalable (whole genome sequence data for tens of thousands of isolates) means of addressing a wide range of functional questions, including: the prediction of antimicrobial resistance; likely cross-reactivity with vaccine antigens; and the functional activities of different variants that lead to key phenotypes.  There are no limitations to the number of sequences, genetic loci, allelic variants or schemes (combinations of loci) that can be included, enabling each database to represent an expanding catalogue of the genetic variation of the population in question.  In addition to providing web-accessible analyses and links to third-party analysis and visualisation tools, the BIGSdb software includes a RESTful application programming interface (API) that enables access to all the underlying data for third-party applications and data analysis pipelines

    The global meningitis genome partnership

    Get PDF
    GGenomic surveillance of bacterial meningitis pathogens is essential for effective disease control globally, enabling identification of emerging and expanding strains and consequent public health interventions. While there has been a rise in the use of whole genome sequencing, this has been driven predominately by a subset of countries with adequate capacity and resources. Global capacity to participate in surveillance needs to be expanded, particularly in low and middle-income countries with high disease burdens. In light of this, the WHO-led collaboration, Defeating Meningitis by 2030 Global Roadmap, has called for the establishment of a Global Meningitis Genome Partnership that links resources for: N. meningitidis (Nm), S. pneumoniae (Sp), H. influenzae (Hi) and S. agalactiae (Sa) to improve worldwide co-ordination of strain identification and tracking. Existing platforms containing relevant genomes include: PubMLST: Nm (31,622), Sp (15,132), Hi (1935), Sa (9026); The Wellcome Sanger Institute: Nm (13,711), Sp (> 24,000), Sa (6200), Hi (1738); and BMGAP: Nm (8785), Hi (2030). A steering group is being established to coordinate the initiative and encourage high-quality data curation. Next steps include: developing guidelines on open-access sharing of genomic data; defining a core set of metadata; and facilitating development of user-friendly interfaces that represent publicly available data

    The global meningitis genome partnership.

    Get PDF
    Genomic surveillance of bacterial meningitis pathogens is essential for effective disease control globally, enabling identification of emerging and expanding strains and consequent public health interventions. While there has been a rise in the use of whole genome sequencing, this has been driven predominately by a subset of countries with adequate capacity and resources. Global capacity to participate in surveillance needs to be expanded, particularly in low and middle-income countries with high disease burdens. In light of this, the WHO-led collaboration, Defeating Meningitis by 2030 Global Roadmap, has called for the establishment of a Global Meningitis Genome Partnership that links resources for: N. meningitidis (Nm), S. pneumoniae (Sp), H. influenzae (Hi) and S. agalactiae (Sa) to improve worldwide co-ordination of strain identification and tracking. Existing platforms containing relevant genomes include: PubMLST: Nm (31,622), Sp (15,132), Hi (1935), Sa (9026); The Wellcome Sanger Institute: Nm (13,711), Sp (> 24,000), Sa (6200), Hi (1738); and BMGAP: Nm (8785), Hi (2030). A steering group is being established to coordinate the initiative and encourage high-quality data curation. Next steps include: developing guidelines on open-access sharing of genomic data; defining a core set of metadata; and facilitating development of user-friendly interfaces that represent publicly available data

    Validation of a Bioinformatics Workflow for Routine Analysis of Whole-Genome Sequencing Data and Related Challenges for Pathogen Typing in a European National Reference Center: Neisseria meningitidis as a Proof-of-Concept

    Get PDF
    Despite being a well-established research method, the use of whole-genome sequencing (WGS) for routine molecular typing and pathogen characterization remains a substantial challenge due to the required bioinformatics resources and/or expertise. Moreover, many national reference laboratories and centers, as well as other laboratories working under a quality system, require extensive validation to demonstrate that employed methods are “fit-for-purpose” and provide high-quality results. A harmonized framework with guidelines for the validation of WGS workflows does currently, however, not exist yet, despite several recent case studies highlighting the urgent need thereof. We present a validation strategy focusing specifically on the exhaustive characterization of the bioinformatics analysis of a WGS workflow designed to replace conventionally employed molecular typing methods for microbial isolates in a representative small-scale laboratory, using the pathogen Neisseria meningitidis as a proof-of-concept. We adapted several classically employed performance metrics specifically toward three different bioinformatics assays: resistance gene characterization (based on the ARG-ANNOT, ResFinder, CARD, and NDARO databases), several commonly employed typing schemas (including, among others, core genome multilocus sequence typing), and serogroup determination. We analyzed a core validation dataset of 67 well-characterized samples typed by means of classical genotypic and/or phenotypic methods that were sequenced in-house, allowing to evaluate repeatability, reproducibility, accuracy, precision, sensitivity, and specificity of the different bioinformatics assays. We also analyzed an extended validation dataset composed of publicly available WGS data for 64 samples by comparing results of the different bioinformatics assays against results obtained from commonly used bioinformatics tools. We demonstrate high performance, with values for all performance metrics >87%, >97%, and >90% for the resistance gene characterization, sequence typing, and serogroup determination assays, respectively, for both validation datasets. Our WGS workflow has been made publicly available as a “push-button” pipeline for Illumina data at https://galaxy.sciensano.be to showcase its implementation for non-profit and/or academic usage. Our validation strategy can be adapted to other WGS workflows for other pathogens of interest and demonstrates the added value and feasibility of employing WGS with the aim of being integrated into routine use in an applied public health setting

    A RESTful application programming interface for the PubMLST molecular typing and genome databases

    No full text
    Molecular typing is used to differentiate microorganisms at the subspecies or strain level for epidemiological investigations, infection control, public health and environmental sampling. DNA sequence-based typing methods require authoritative databases that link sequence variants to nomenclature in order to facilitate communication and comparison of identified types in national or global settings. The PubMLST website (https://pubmlst.org/) fulfils this role for over a hundred microorganisms for which it hosts curated molecular sequence typing data, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene typing approaches. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) and whole genome MLST (wgMLST) which catalogue the allelic diversity found in hundreds to thousands of genes. These approaches provide a common nomenclature for high-resolution strain characterization and comparison. Molecular typing information is linked to isolate provenance, phenotype, and increasingly genome assemblies, providing a resource for outbreak investigation and research in to population structure, gene association, global epidemiology and vaccine coverage. A Representational State Transfer (REST) Application Programming Interface (API) has been developed for the PubMLST website to make these large quantities of structured molecular typing and whole genome sequence data available for programmatic access by any third party application. The API is an integral component of the Bacterial Isolate Genome Sequence Database (BIGSdb) platform that is used to host PubMLST resources, and exposes all public data within the site. In addition to data browsing, searching and download, the API supports authentication and submission of new data to curator queues
    corecore