231 research outputs found

    A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

    Get PDF
    BACKGROUND: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. RESULTS: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. CONCLUSIONS: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data

    SV-Pop: population-based structural variant analysis and visualization.

    Get PDF
    BACKGROUND: Genetic structural variation underpins a multitude of phenotypes, with significant implications for a range of biological outcomes. Despite their crucial role, structural variants (SVs) are often neglected and overshadowed by single nucleotide polymorphisms (SNPs), which are used in large-scale analysis such as genome-wide association and population genetic studies. RESULTS: To facilitate the high-throughput analysis of structural variation we have developed an analytical pipeline and visualisation tool, called SV-Pop. The utility of this pipeline was then demonstrated through application with a large, multi-population P. falciparum dataset. CONCLUSIONS: Designed to facilitate downstream analysis and visualisation post-discovery, SV-Pop allows for straightforward integration of multi-population analysis, method and sample-based concordance metrics, and signals of selection

    Large-scale genomic analysis of global Klebsiella pneumoniae plasmids reveals multiple simultaneous clusters of carbapenem-resistant hypervirulent strains

    Get PDF
    BACKGROUND: Klebsiella pneumoniae (Kp) Gram-negative bacteria cause nosocomial infections and rapidly acquire antimicrobial resistance (AMR), which makes it a global threat to human health. It also has a comparatively rare hypervirulent phenotype that can lead to severe disease in otherwise healthy individuals. Unlike classic Kp, canonical hypervirulent strains usually have limited AMR. However, after initial case reports in 2015, carbapenem-resistant hypervirulent Kp has increased in prevalence, including in China, but there is limited understanding of its burden  in other geographical regions. METHODS: Here, we examined the largest collection of publicly available sequenced Kp isolates (n=13,178), containing 1603 different sequence types (e.g. ST11 15.0%, ST258 9.5%), and 2174 (16.5%) hypervirulent strains. We analysed the plasmid replicons and carbapenemase and siderophore encoding genes to understand the movement of hypervirulence and AMR genes located on plasmids, and their convergence in carbapenem-resistant hypervirulent Kp. RESULTS: We identified and analysed 3034 unique plasmid replicons to inform the epidemiology and transmission dynamics of carbapenem-resistant hypervirulent Kp (n=1028, 7.8%). We found several outbreaks globally, including one involving ST11 strains in China and another of ST231 in Asia centred on India, Thailand, and Pakistan. There was evidence of global flow of Kp, including across multiple continents. In most cases, clusters of Kp isolates are the result of hypervirulence genes entering classic strains, instead of carbapenem resistance genes entering canonical hypervirulent ones. CONCLUSIONS: Our analysis demonstrates the importance of plasmid analysis in the monitoring of carbapenem-resistant and hypervirulent strains of Kp. With the growing adoption of omics-based technologies for clinical and surveillance applications, including in geographical regions with gaps in data and knowledge (e.g. sub-Saharan Africa), the identification of the spread of AMR will inform infection control globally

    Genomic analysis of hypervirulent Klebsiella pneumoniae reveals potential genetic markers for differentiation from classical strains.

    Get PDF
    The majority of Klebsiella pneumoniae (Kp) infections are nosocomial, but a growing number of community-acquired infections are caused by hypervirulent strains (hvKp) characterised by liver invasion and rapid metastasis. Unlike nosocomial Kp infections, hvKp are generally susceptible to antibiotics. Due to the rapid progression of hvKp infections, timely and accurate diagnosis is required for effective treatment. To identify potential drivers of the hypervirulent phenotype, we performed a genome-wide association study (GWAS) analysis on single nucleotide variants and accessory genome loci across 79 publicly available Kp isolates collected from patients' liver and a diverse global Kp dataset (n = 646). The GWAS analysis revealed 29 putative genes (P < 10-10) associated with higher risk of liver phenotype, including hypervirulence linked salmochelin iro (odds ratio (OR): 29.8) and aerobactin iuc (OR: 14.1) loci. A minority of liver isolates (n = 15, 19%) had neither of these siderophores nor any other shared biomarker, suggesting possible unknown drivers of hypervirulence and an intrinsic ability of Kp to invade the liver. Despite identifying potential novel loci linked to a liver invasive Kp phenotype, our work highlights the need for large-scale studies involving more sequence types to identify further hypervirulence biomarkers to assist clinical decision making

    estMOI: estimating multiplicity of infection using parasite deep sequencing data.

    Get PDF
    Individuals living in endemic areas generally harbour multiple parasite strains. Multiplicity of infection (MOI) can be an indicator of immune status and transmission intensity. It has a potentially confounding effect on a number of population genetic analyses, which often assume isolates are clonal. Polymerase chain reaction-based approaches to estimate MOI can lack sensitivity. For example, in the human malaria parasite Plasmodium falciparum, genotyping of the merozoite surface protein (MSP1/2) genes is a standard method for assessing MOI, despite the apparent problem of underestimation. The availability of deep coverage data from massively parallizable sequencing technologies means that MOI can be detected genome wide by considering the abundance of heterozygous genotypes. Here, we present a method to estimate MOI, which considers unique combinations of polymorphisms from sequence reads. The method is implemented within the estMOI software. When applied to clinical P.falciparum isolates from three continents, we find that multiple infections are common, especially in regions with high transmission

    Genetic diversity of next generation antimalarial targets: A baseline for drug resistance surveillance programmes.

    Get PDF
    Drug resistance is a recurrent problem in the fight against malaria. Genetic and epidemiological surveillance of antimalarial resistant parasite alleles is crucial to guide drug therapies and clinical management. New antimalarial compounds are currently at various stages of clinical trials and regulatory evaluation. Using ?2000 Plasmodium falciparum genome sequences, we investigated the genetic diversity of eleven gene-targets of promising antimalarial compounds and assessed their potential efficiency across malaria endemic regions. We determined if the loci are under selection prior to the introduction of new drugs and established a baseline of genetic variance, including potential resistant alleles, for future surveillance programmes

    Robust detection of point mutations involved in multidrug-resistant Mycobacterium tuberculosis in the presence of co-occurrent resistance markers

    Get PDF
    Tuberculosis disease is a major global public health concern and the growing prevalence of drug-resistant Mycobacterium tuberculosis is making disease control more difficult. However, the increasing application of whole-genome sequencing as a diagnostic tool is leading to the profiling of drug resistance to inform clinical practice and treatment decision making. Computational approaches for identifying established and novel resistance-conferring mutations in genomic data include genome-wide association study (GWAS) methodologies, tests for convergent evolution and machine learning techniques. These methods may be confounded by extensive co-occurrent resistance, where statistical models for a drug include unrelated mutations known to be causing resistance to other drugs. Here, we introduce a novel ‘cannibalistic’ elimination algorithm (“Hungry, Hungry SNPos”) that attempts to remove these co-occurrent resistant variants. Using an M. tuberculosis genomic dataset for the virulent Beijing strain-type (n=3,574) with phenotypic resistance data across five drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, and streptomycin), we demonstrate that this new approach is considerably more robust than traditional methods and detects resistance-associated variants too rare to be likely picked up by correlation-based techniques like GWA

    A modified decision tree approach to improve the prediction and mutation discovery for drug resistance in Mycobacterium tuberculosis.

    Get PDF
    BACKGROUND: Drug resistant Mycobacterium tuberculosis is complicating the effective treatment and control of tuberculosis disease (TB). With the adoption of whole genome sequencing as a diagnostic tool, machine learning approaches are being employed to predict M. tuberculosis resistance and identify underlying genetic mutations. However, machine learning approaches can overfit and fail to identify causal mutations if they are applied out of the box and not adapted to the disease-specific context. We introduce a machine learning approach that is customized to the TB setting, which extracts a library of genomic variants re-occurring across individual studies to improve genotypic profiling. RESULTS: We developed a customized decision tree approach, called Treesist-TB, that performs TB drug resistance prediction by extracting and evaluating genomic variants across multiple studies. The application of Treesist-TB to rifampicin (RIF), isoniazid (INH) and ethambutol (EMB) drugs, for which resistance mutations are known, demonstrated a level of predictive accuracy similar to the widely used TB-Profiler tool (Treesist-TB vs. TB-Profiler tool: RIF 97.5% vs. 97.6%; INH 96.8% vs. 96.5%; EMB 96.8% vs. 95.8%). Application of Treesist-TB to less understood second-line drugs of interest, ethionamide (ETH), cycloserine (CYS) and para-aminosalisylic acid (PAS), led to the identification of new variants (52, 6 and 11, respectively), with a high number absent from the TB-Profiler library (45, 4, and 6, respectively). Thereby, Treesist-TB had improved predictive sensitivity (Treesist-TB vs. TB-Profiler tool: PAS 64.3% vs. 38.8%; CYS 45.3% vs. 30.7%; ETH 72.1% vs. 71.1%). CONCLUSION: Our work reinforces the utility of machine learning for drug resistance prediction, while highlighting the need to customize approaches to the disease-specific context. Through applying a modified decision learning approach (Treesist-TB) across a range of anti-TB drugs, we identified plausible resistance-encoding genomic variants with high predictive ability, whilst potentially overcoming the overfitting challenges that can affect standard machine learning applications

    COVID-profiler: a webserver for the analysis of SARS-CoV-2 sequencing data.

    Get PDF
    BACKGROUND: SARS-CoV-2 virus sequencing has been applied to track the COVID-19 pandemic spread and assist the development of PCR-based diagnostics, serological assays, and vaccines. With sequencing becoming routine globally, bioinformatic tools are needed to assist in the robust processing of resulting genomic data. RESULTS: We developed a web-based bioinformatic pipeline ("COVID-Profiler") that inputs raw or assembled sequencing data, displays raw alignments for quality control, annotates mutations found and performs phylogenetic analysis. The pipeline software can be applied to other (re-) emerging pathogens. CONCLUSIONS: The webserver is available at http://genomics.lshtm.ac.uk/ . The source code is available at https://github.com/jodyphelan/covid-profiler

    Whole genome sequencing reveals large deletions and other loss of function mutations in Mycobacterium tuberculosis drug resistance genes.

    Get PDF
    Drug resistance in Mycobacterium tuberculosis, the causative agent of tuberculosis disease, arises from genetic mutations in genes coding for drug-targets or drug-converting enzymes. SNPs linked to drug resistance have been extensively studied and form the basis of molecular diagnostics and sequencing-based resistance profiling. However, alternative forms of functional variation such as large deletions and other loss of function (LOF) mutations have received much less attention, but if incorporated into diagnostics they are likely to improve their predictive performance. Our work aimed to characterize the contribution of LOF mutations found in 42 established drug resistance genes linked to 19 anti-tuberculous drugs across 32689 sequenced clinical isolates. The analysed LOF mutations included large deletions (n=586), frameshifts (n=4764) and premature stop codons (n=826). We found LOF mutations in genes strongly linked to pyrazinamide (pncA), isoniazid (katG), capreomycin (tlyA), streptomycin (e.g. gid) and ethionamide (ethA, mshA) (P<10-5), but also in some loci linked to drugs where relatively less phenotypic data is available [e.g. cycloserine, delaminid, bedaquiline, para-aminosalicylic acid (PAS), and clofazimine]. This study reports that large deletions (median size 1115 bp) account for a significant portion of resistance variants found for PAS (+7.1% of phenotypic resistance percentage explained), pyrazinamide (+3.5%) and streptomycin (+2.6%) drugs, and can be used to improve the prediction of cryptic resistance. Overall, our work highlights the importance of including LOF mutations (e.g. large deletions) in predicting genotypic drug resistance, thereby informing tuberculosis infection control and clinical decision-making
    corecore