406 research outputs found

    A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

    Get PDF
    BACKGROUND: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. RESULTS: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. CONCLUSIONS: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data

    On the performance of multiple imputation based on chained equations in tackling missing data of the African α3.7 -globin deletion in a malaria association study.

    Get PDF
    Multiple imputation based on chained equations (MICE) is an alternative missing genotype method that can use genetic and nongenetic auxiliary data to inform the imputation process. Previously, MICE was successfully tested on strongly linked genetic data. We have now tested it on data of the HBA2 gene which, by the experimental design used in a malaria association study in Tanzania, shows a high missing data percentage and is weakly linked with the remaining genetic markers in the data set. We constructed different imputation models and studied their performance under different missing data conditions. Overall, MICE failed to accurately predict the true genotypes. However, using the best imputation model for the data, we obtained unbiased estimates for the genetic effects, and association signals of the HBA2 gene on malaria positivity. When the whole data set was analyzed with the same imputation model, the association signal increased from 0.80 to 2.70 before and after imputation, respectively. Conversely, postimputation estimates for the genetic effects remained the same in relation to the complete case analysis but showed increased precision. We argue that these postimputation estimates are reasonably unbiased, as a result of a good study design based on matching key socio-environmental factors

    Structural and Genomic Insights Into Pyrazinamide Resistance in Mycobacterium tuberculosis Underlie Differences Between Ancient and Modern Lineages.

    Get PDF
    Resistance to drugs used to treat tuberculosis disease (TB) continues to remain a public health burden, with missense point mutations in the underlying Mycobacterium tuberculosis bacteria described for nearly all anti-TB drugs. The post-genomics era along with advances in computational and structural biology provide opportunities to understand the interrelationships between the genetic basis and the structural consequences of M. tuberculosis mutations linked to drug resistance. Pyrazinamide (PZA) is a crucial first line antibiotic currently used in TB treatment regimens. The mutational promiscuity exhibited by the pncA gene (target for PZA) necessitates computational approaches to investigate the genetic and structural basis for PZA resistance development. We analysed 424 missense point mutations linked to PZA resistance derived from ∼35K M. tuberculosis clinical isolates sourced globally, which comprised the four main M. tuberculosis lineages (Lineage 1-4). Mutations were annotated to reflect their association with PZA resistance. Genomic measures (minor allele frequency and odds ratio), structural features (surface area, residue depth and hydrophobicity) and biophysical effects (change in stability and ligand affinity) of point mutations on pncA protein stability and ligand affinity were assessed. Missense point mutations within pncA were distributed throughout the gene, with the majority (>80%) of mutations with a destabilising effect on protomer stability and on ligand affinity. Active site residues involved in PZA binding were associated with multiple point mutations highlighting mutational diversity due to selection pressures at these functionally important sites. There were weak associations between genomic measures and biophysical effect of mutations. However, mutations associated with PZA resistance showed statistically significant differences between structural features (surface area and residue depth), but not hydrophobicity score for mutational sites. Most interestingly M. tuberculosis lineage 1 (ancient lineage) exhibited a distinct protein stability profile for mutations associated with PZA resistance, compared to modern lineages

    Case report: A successfully treated case of community-acquired urinary tract infection due to Klebsiella aerogenes in Bangladesh

    Get PDF
    Klebsiella aerogenes, a nosocomial pathogen, is increasingly associated with extensive drug resistance and virulence profiles. It is responsible for high morbidity and mortality. This report describes the first successfully treated case of community-acquired urinary tract infection (UTI) caused by Klebsiella aerogenes in an elderly housewife with Type-2 diabetes (T2D) from Dhaka, Bangladesh. The patient was empirically treated with intravenous ceftriaxone (500 mg/8 h). However, she did not respond to the treatment. The urine culture and sensitivity tests, coupled with bacterial whole-genome sequencing (WGS) and analysis, revealed the bacteria to be K. aerogenes which was extensively drug-resistant but was susceptible to carbapenems and polymyxins. Based on these findings, meropenem (500 mg/8 h) was administered to the patient, who then responded to the treatment and recovered successfully without having a relapse. This case raises awareness of the importance of diagnosis of not-so-common etiological agents, correct identification of the pathogens, and targeted antibiotic therapy. In conclusion, correctly identifying etiological agents of UTI using WGS approaches that are otherwise difficult to diagnose could help improve the identification of infectious agents and improve the management of infectious diseases

    SVAMP: sequence variation analysis, maps and phylogeny.

    Get PDF
    SUMMARY: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima's D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. AVAILABILITY AND IMPLEMENTATION: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp CONTACT: [email protected] or [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    SV-Pop: population-based structural variant analysis and visualization.

    Get PDF
    BACKGROUND: Genetic structural variation underpins a multitude of phenotypes, with significant implications for a range of biological outcomes. Despite their crucial role, structural variants (SVs) are often neglected and overshadowed by single nucleotide polymorphisms (SNPs), which are used in large-scale analysis such as genome-wide association and population genetic studies. RESULTS: To facilitate the high-throughput analysis of structural variation we have developed an analytical pipeline and visualisation tool, called SV-Pop. The utility of this pipeline was then demonstrated through application with a large, multi-population P. falciparum dataset. CONCLUSIONS: Designed to facilitate downstream analysis and visualisation post-discovery, SV-Pop allows for straightforward integration of multi-population analysis, method and sample-based concordance metrics, and signals of selection

    Genetic Diversity of norA, Coding for a Main Efflux Pump of Staphylococcus aureus

    Get PDF
    NorA is the best studied efflux system of Staphylococcus aureus and therefore frequently used as a model for investigating efflux-mediated resistance in this pathogen. NorA activity is associated with resistance to fluoroquinolones, several antiseptics and disinfectants and several reports have pointed out the role of efflux systems, including NorA, as a first-line response to antimicrobials in S. aureus. Genetic diversity studies of the gene norA have described three alleles; norAI, norAII and norAIII. However, the epidemiology of these alleles and their impact on NorA activity remains unclear. Additionally, increasing studies do not account for norA variability when establishing relations between resistance phenotypes and norA presence or reported absence, which actually corresponds, as we now demonstrate, to different norA alleles. In the present study we assessed the variability of the norA gene present in the genome of over 1,000 S. aureus isolates, corresponding to 112 S. aureus strains with whole genome sequences publicly available; 917 MRSA strains sourced from a London-based study and nine MRSA isolates collected in a major Hospital in Lisbon, Portugal. Our analyses show that norA is part of the core genome of S. aureus. It also suggests that occurrence of norA variants reflects the population structure of this major pathogen. Overall, this work highlights the ubiquitous nature of norA in S. aureus which must be taken into account when studying the role played by this important determinant on S. aureus resistance to antimicrobials

    Large-scale genomic analysis of global Klebsiella pneumoniae plasmids reveals multiple simultaneous clusters of carbapenem-resistant hypervirulent strains

    Get PDF
    BACKGROUND: Klebsiella pneumoniae (Kp) Gram-negative bacteria cause nosocomial infections and rapidly acquire antimicrobial resistance (AMR), which makes it a global threat to human health. It also has a comparatively rare hypervirulent phenotype that can lead to severe disease in otherwise healthy individuals. Unlike classic Kp, canonical hypervirulent strains usually have limited AMR. However, after initial case reports in 2015, carbapenem-resistant hypervirulent Kp has increased in prevalence, including in China, but there is limited understanding of its burden  in other geographical regions. METHODS: Here, we examined the largest collection of publicly available sequenced Kp isolates (n=13,178), containing 1603 different sequence types (e.g. ST11 15.0%, ST258 9.5%), and 2174 (16.5%) hypervirulent strains. We analysed the plasmid replicons and carbapenemase and siderophore encoding genes to understand the movement of hypervirulence and AMR genes located on plasmids, and their convergence in carbapenem-resistant hypervirulent Kp. RESULTS: We identified and analysed 3034 unique plasmid replicons to inform the epidemiology and transmission dynamics of carbapenem-resistant hypervirulent Kp (n=1028, 7.8%). We found several outbreaks globally, including one involving ST11 strains in China and another of ST231 in Asia centred on India, Thailand, and Pakistan. There was evidence of global flow of Kp, including across multiple continents. In most cases, clusters of Kp isolates are the result of hypervirulence genes entering classic strains, instead of carbapenem resistance genes entering canonical hypervirulent ones. CONCLUSIONS: Our analysis demonstrates the importance of plasmid analysis in the monitoring of carbapenem-resistant and hypervirulent strains of Kp. With the growing adoption of omics-based technologies for clinical and surveillance applications, including in geographical regions with gaps in data and knowledge (e.g. sub-Saharan Africa), the identification of the spread of AMR will inform infection control globally

    Genomic analysis of hypervirulent Klebsiella pneumoniae reveals potential genetic markers for differentiation from classical strains.

    Get PDF
    The majority of Klebsiella pneumoniae (Kp) infections are nosocomial, but a growing number of community-acquired infections are caused by hypervirulent strains (hvKp) characterised by liver invasion and rapid metastasis. Unlike nosocomial Kp infections, hvKp are generally susceptible to antibiotics. Due to the rapid progression of hvKp infections, timely and accurate diagnosis is required for effective treatment. To identify potential drivers of the hypervirulent phenotype, we performed a genome-wide association study (GWAS) analysis on single nucleotide variants and accessory genome loci across 79 publicly available Kp isolates collected from patients' liver and a diverse global Kp dataset (n = 646). The GWAS analysis revealed 29 putative genes (P < 10-10) associated with higher risk of liver phenotype, including hypervirulence linked salmochelin iro (odds ratio (OR): 29.8) and aerobactin iuc (OR: 14.1) loci. A minority of liver isolates (n = 15, 19%) had neither of these siderophores nor any other shared biomarker, suggesting possible unknown drivers of hypervirulence and an intrinsic ability of Kp to invade the liver. Despite identifying potential novel loci linked to a liver invasive Kp phenotype, our work highlights the need for large-scale studies involving more sequence types to identify further hypervirulence biomarkers to assist clinical decision making

    estMOI: estimating multiplicity of infection using parasite deep sequencing data.

    Get PDF
    Individuals living in endemic areas generally harbour multiple parasite strains. Multiplicity of infection (MOI) can be an indicator of immune status and transmission intensity. It has a potentially confounding effect on a number of population genetic analyses, which often assume isolates are clonal. Polymerase chain reaction-based approaches to estimate MOI can lack sensitivity. For example, in the human malaria parasite Plasmodium falciparum, genotyping of the merozoite surface protein (MSP1/2) genes is a standard method for assessing MOI, despite the apparent problem of underestimation. The availability of deep coverage data from massively parallizable sequencing technologies means that MOI can be detected genome wide by considering the abundance of heterozygous genotypes. Here, we present a method to estimate MOI, which considers unique combinations of polymorphisms from sequence reads. The method is implemented within the estMOI software. When applied to clinical P.falciparum isolates from three continents, we find that multiple infections are common, especially in regions with high transmission
    • …
    corecore