22,710 research outputs found

    Public data and open source tools for multi-assay genomic investigation of disease

    Get PDF
    Molecular interrogation of a biological sample through DNA sequencing, RNA and microRNA profiling, proteomics and other assays, has the potential to provide a systems level approach to predicting treatment response and disease progression, and to developing precision therapies. Large publicly funded projects have generated extensive and freely available multi-assay data resources; however, bioinformatic and statistical methods for the analysis of such experiments are still nascent. We review multi-assay genomic data resources in the areas of clinical oncology, pharmacogenomics and other perturbation experiments, population genomics and regulatory genomics and other areas, and tools for data acquisition. Finally, we review bioinformatic tools that are explicitly geared toward integrative genomic data visualization and analysis. This review provides starting points for accessing publicly available data and tools to support development of needed integrative methods

    Rice Galaxy: An open resource for plant science

    Get PDF
    Background: Rice molecular genetics, breeding, genetic diversity, and allied research (such as rice-pathogen interaction) have adopted sequencing technologies and high-density genotyping platforms for genome variation analysis and gene discovery. Germplasm collections representing rice diversity, improved varieties, and elite breeding materials are accessible through rice gene banks for use in research and breeding, with many having genome sequences and high-density genotype data available. Combining phenotypic and genotypic information on these accessions enables genome-wide association analysis, which is driving quantitative trait loci discovery and molecular marker development. Comparative sequence analyses across quantitative trait loci regions facilitate the discovery of novel alleles. Analyses involving DNA sequences and large genotyping matrices for thousands of samples, however, pose a challenge to non−computer savvy rice researchers. Findings: The Rice Galaxy resource has shared datasets that include high-density genotypes from the 3,000 Rice Genomes project and sequences with corresponding annotations from 9 published rice genomes. The Rice Galaxy web server and deployment installer includes tools for designing single-nucleotide polymorphism assays, analyzing genome-wide association studies, population diversity, rice−bacterial pathogen diagnostics, and a suite of published genomic prediction methods. A prototype Rice Galaxy compliant to Open Access, Open Data, and Findable, Accessible, Interoperable, and Reproducible principles is also presented. Conclusions: Rice Galaxy is a freely available resource that empowers the plant research community to perform state-of-the-art analyses and utilize publicly available big datasets for both fundamental and applied science

    Molecular Investigations of a Locally Acquired Case of Melioidosis in Southern AZ, USA

    Get PDF
    Melioidosis is caused by Burkholderia pseudomallei, a Gram-negative bacillus, primarily found in soils in Southeast Asia and northern Australia. A recent case of melioidosis in non-endemic Arizona was determined to be the result of locally acquired infection, as the patient had no travel history to endemic regions and no previous history of disease. Diagnosis of the case was confirmed through multiple microbiologic and molecular techniques. To enhance the epidemiological analysis, we conducted several molecular genotyping procedures, including multi-locus sequence typing, SNP-profiling, and whole genome sequence typing. Each technique has different molecular epidemiologic advantages, all of which provided evidence that the infecting strain was most similar to those found in Southeast Asia, possibly originating in, or around, Malaysia. Advancements in new typing technologies provide genotyping resolution not previously available to public health investigators, allowing for more accurate source identification

    PolyTB: a genomic variation map for Mycobacterium tuberculosis

    Get PDF
    Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest

    Status and potential of bacterial genomics for public health practice : a scoping review

    Get PDF
    Background: Next-generation sequencing (NGS) is increasingly being translated into routine public health practice, affecting the surveillance and control of many pathogens. The purpose of this scoping review is to identify and characterize the recent literature concerning the application of bacterial pathogen genomics for public health practice and to assess the added value, challenges, and needs related to its implementation from an epidemiologist’s perspective. Methods: In this scoping review, a systematic PubMed search with forward and backward snowballing was performed to identify manuscripts in English published between January 2015 and September 2018. Included studies had to describe the application of NGS on bacterial isolates within a public health setting. The studied pathogen, year of publication, country, number of isolates, sampling fraction, setting, public health application, study aim, level of implementation, time orientation of the NGS analyses, and key findings were extracted from each study. Due to a large heterogeneity of settings, applications, pathogens, and study measurements, a descriptive narrative synthesis of the eligible studies was performed. Results: Out of the 275 included articles, 164 were outbreak investigations, 70 focused on strategy-oriented surveillance, and 41 on control-oriented surveillance. Main applications included the use of whole-genome sequencing (WGS) data for (1) source tracing, (2) early outbreak detection, (3) unraveling transmission dynamics, (4) monitoring drug resistance, (5) detecting cross-border transmission events, (6) identifying the emergence of strains with enhanced virulence or zoonotic potential, and (7) assessing the impact of prevention and control programs. The superior resolution over conventional typing methods to infer transmission routes was reported as an added value, as well as the ability to simultaneously characterize the resistome and virulome of the studied pathogen. However, the full potential of pathogen genomics can only be reached through its integration with high-quality contextual data. Conclusions: For several pathogens, it is time for a shift from proof-of-concept studies to routine use of WGS during outbreak investigations and surveillance activities. However, some implementation challenges from the epidemiologist’s perspective remain, such as data integration, quality of contextual data, sampling strategies, and meaningful interpretations. Interdisciplinary, inter-sectoral, and international collaborations are key for an appropriate genomics-informed surveillance

    Development and Validation of Clinical Whole-Exome and Whole-Genome Sequencing for Detection of Germline Variants in Inherited Disease

    Get PDF
    Context.-With the decrease in the cost of sequencing, the clinical testing paradigm has shifted from single gene to gene panel and now whole-exome and whole-genome sequencing. Clinical laboratories are rapidly implementing next-generation sequencing-based whole-exome and whole-genome sequencing. Because a large number of targets are covered by whole-exome and whole-genome sequencing, it is critical that a laboratory perform appropriate validation studies, develop a quality assurance and quality control program, and participate in proficiency testing. Objective.-To provide recommendations for wholeexome and whole-genome sequencing assay design, validation, and implementation for the detection of germline variants associated in inherited disorders. Data Sources.-An example of trio sequencing, filtration and annotation of variants, and phenotypic consideration to arrive at clinical diagnosis is discussed. Conclusions.-It is critical that clinical laboratories planning to implement whole-exome and whole-genome sequencing design and validate the assay to specifications and ensure adequate performance prior to implementation. Test design specifications, including variant filtering and annotation, phenotypic consideration, guidance on consenting options, and reporting of incidental findings, are provided. These are important steps a laboratory must take to validate and implement whole-exome and whole-genome sequencing in a clinical setting for germline variants in inherited disorders

    Standardized metadata for human pathogen/vector genomic sequences

    Full text link
    High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant
    corecore