32 research outputs found

    SRST2: Rapid genomic surveillance for public health and hospital microbiology labs.

    Get PDF
    Rapid molecular typing of bacterial pathogens is critical for public health epidemiology, surveillance and infection control, yet routine use of whole genome sequencing (WGS) for these purposes poses significant challenges. Here we present SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data. Using >900 genomes from common pathogens, we show SRST2 is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment. We include validation of SRST2 within a public health laboratory, and demonstrate its use for microbial genome surveillance in the hospital setting. In the face of rising threats of antimicrobial resistance and emerging virulence among bacterial pathogens, SRST2 represents a powerful tool for rapidly extracting clinically useful information from raw WGS data. Source code is available from http://katholt.github.io/srst2/

    STRetch reference data

    No full text

    Bioinformatics methods and approaches to discover disease variants from DNA sequencing data

    Get PDF
    © 2019 Harriet DashnowNext-generation sequencing is increasingly used to diagnose patients with suspected genetic disease. Yet, even after exome or whole genome sequencing, many patients remain undiagnosed. In many cases a genetic diagnosis is not made because we either failed to detect the causal variant, or succeeded in detecting it, but failed to identify it as causative. There is a clear need to develop novel bioinformatics methods and sequencing strategies to address these shortcomings and to increase diagnostic rates. In this thesis I develop several strategies to address these issues. I propose a pooled-parent exome sequencing approach to prioritise de novo variants for genetic disease diagnosis. In this strategy, a set of probands have individual exome sequencing, while the DNA from all the parents of the probands are pooled, exome captured and sequenced together. The variants called in this pool are used to filter out inherited variants in the probands so the remaining list is enriched for de novo variants. Short Tandem Repeat (STR) expansions are a class of disease-causing variants that are frequently missed in short read sequencing data. Here I develop and validate STRetch, a new bioinformatics method to detect STR expansions using STR decoy chromosomes. I show that STRetch can be used to detect both known pathogenic STR expansions, and novel expansions at other annotated STR loci across the genome. I further use STRetch to explore variation across hundreds of individuals to inform our understanding of what is common variation and what is potentially pathogenic, to aid in prioritising STR variants in a gene-discovery setting. Some of the methods that I have developed and describe within this thesis have already been used to help patients receive a genetic diagnosis

    STRetch summary statistics from 97 PCR-free whole genomes

    No full text
    Robust median and variance estimates for all STR loci (based on the hg19 TRF annotation) for which any evidence of expansion has been observed in 97 PCR-free whole genomes. This file is generated by using the STRetch estimateSTR.py script with the --emit flag (see https://github.com/Oshlack/STRetch).<br

    ABACBS2016_education_Harriet_Dashnow.pptx

    No full text
    <div>Software Carpentry and Data Carpentry in Bioinformatics Training</div><div>Harriet Dashnow</div><div><br></div><div>Presentation at GOBLET Best Practices in Bioinformatics Training workshop, AB3CBS 2017, Brisbane, Australia </div

    Many Roads to Bioinformatics - Lorne Genome 2016

    No full text
    Talking about my transition (from genetics/molecular biology to bioinformatics), about common career paths into bioinformatics, and about the training, particularly COMBINE, Software Carpentry and Data Carpentry.<div><br></div><div>Bioinformatics Technology Workshop - Lorne Genome 2016</div

    Comparing algorithms to genotype short tandem repeats in next-generation sequencing data

    No full text
    <p>Short tandem repeats (STRs) are short (2-6bp) DNA sequences repeated in tandem, which make up approximately 3% of the human genome. These loci are prone to frequent mutations and high polymorphism. Dozens of neurological and developmental disorders have been attributed to STR expansions. STRs have also been implicated in a range of functions such as DNA replication and repair, chromatin organisation and regulation of gene expression.</p> <p>Traditionally, STR variation has been measured using capillary gel electrophoresis. This process is time-consuming and expensive, and so has tended to limit STR analysis to a handful of loci.</p> <p>Next-generation sequencing has the potential to address these problems. However, determining STR lengths using next-generation sequencing data is difficult. For example, many callers are limited by sequencing read lengths and polymerase slippage during PCR amplification introduces stutter noise.</p> <p>Recently, a small number of software tools have been developed genotype STRs in next-generation sequencing data. We have performed a general comparison of the tools published to date, identifying their application domains, assumptions and limitations.</p> <p>We have assessed the performance of some of the most popular STR genotyping tools on human next-generation sequencing data. When comparing STR callers we have observed drastic differences in which STR loci are identified as variant. Surprisingly, even for variant loci reported in common between tools, there is markedly low concordance between the specific genotype calls.</p> <p>Finally, we draw together our findings to comment on the considerations when choosing and running an STR genotyping tool, with an emphasis on applications to human disease.</p

    Have ten rules.

    No full text
    <p>Have ten rules.</p
    corecore