41 research outputs found

    Machine learning predicts accurately mycobacterium tuberculosis drug resistance from whole genome sequencing data

    Get PDF
    Background: Tuberculosis disease, caused by Mycobacterium tuberculosis, is a major public health problem. The emergence of M. tuberculosis strains resistant to existing treatments threatens to derail control efforts. Resistance is mainly conferred by mutations in genes coding for drug targets or converting enzymes, but our knowledge of these mutations is incomplete. Whole genome sequencing (WGS) is an increasingly common approach to rapidly characterize isolates and identify mutations predicting antimicrobial resistance and thereby providing a diagnostic tool to assist clinical decision making. Methods: We applied machine learning approaches to 16,688 M. tuberculosis isolates that have undergone WGS and laboratory drug-susceptibility testing (DST) across 14 antituberculosis drugs, with 22.5% of samples being multidrug resistant and 2.1% being extensively drug resistant. We used non-parametric classification-tree and gradientboosted-tree models to predict drug resistance and uncover any associated novel putative mutations. We fitted separate models for each drug, with and without “co-occurrent resistance” markers known to be causing resistance to drugs other than the one of interest. Predictive performance was measured using sensitivity, specificity, and the area under the receiver operating characteristic curve, assuming DST results as the gold standard. Results: The predictive performance was highest for resistance to first-line drugs, amikacin, kanamycin, ciprofloxacin, moxifloxacin, and multidrug-resistant tuberculosis (area under the receiver operating characteristic curve above 96%), and lowest for thirdline drugs such as D-cycloserine and Para-aminosalisylic acid (area under the curve below 85%). The inclusion of co-occurrent resistance markers led to improved performance for some drugs and superior results when compared to similar models in other largescale studies, which had smaller sample sizes. Overall, the gradient-boosted-tree models performed better than the classification-tree models. The mutation-rank analysis detected no new single nucleotide polymorphisms linked to drug resistance. Discordance between DST and genotypically inferred resistance may be explained by DST errors, novel rare mutations, hetero-resistance, and nongenomic drivers such as efflux-pump upregulation. Conclusion: Our work demonstrates the utility of machine learning as a flexible approach to drug resistance prediction that is able to accommodate a much larger number of predictors and to summarize their predictive ability, thus assisting clinical decision making and single nucleotide polymorphism detection in an era of increasing WGS data generation

    Machine learning predicts accurately mycobacterium tuberculosis drug resistance from whole genome sequencing data

    Get PDF
    Background: Tuberculosis disease, caused by Mycobacterium tuberculosis, is a major public health problem. The emergence of M. tuberculosis strains resistant to existing treatments threatens to derail control efforts. Resistance is mainly conferred by mutations in genes coding for drug targets or converting enzymes, but our knowledge of these mutations is incomplete. Whole genome sequencing (WGS) is an increasingly common approach to rapidly characterize isolates and identify mutations predicting antimicrobial resistance and thereby providing a diagnostic tool to assist clinical decision making. Methods: We applied machine learning approaches to 16,688 M. tuberculosis isolates that have undergone WGS and laboratory drug-susceptibility testing (DST) across 14 antituberculosis drugs, with 22.5% of samples being multidrug resistant and 2.1% being extensively drug resistant. We used non-parametric classification-tree and gradient-boosted-tree models to predict drug resistance and uncover any associated novel putative mutations. We fitted separate models for each drug, with and without “co-occurrent resistance” markers known to be causing resistance to drugs other than the one of interest. Predictive performance was measured using sensitivity, specificity, and the area under the receiver operating characteristic curve, assuming DST results as the gold standard. Results: The predictive performance was highest for resistance to first-line drugs, amikacin, kanamycin, ciprofloxacin, moxifloxacin, and multidrug-resistant tuberculosis (area under the receiver operating characteristic curve above 96%), and lowest for third-line drugs such as D-cycloserine and Para-aminosalisylic acid (area under the curve below 85%). The inclusion of co-occurrent resistance markers led to improved performance for some drugs and superior results when compared to similar models in other large-scale studies, which had smaller sample sizes. Overall, the gradient-boosted-tree models performed better than the classification-tree models. The mutation-rank analysis detected no new single nucleotide polymorphisms linked to drug resistance. Discordance between DST and genotypically inferred resistance may be explained by DST errors, novel rare mutations, hetero-resistance, and nongenomic drivers such as efflux-pump upregulation. Conclusion: Our work demonstrates the utility of machine learning as a flexible approach to drug resistance prediction that is able to accommodate a much larger number of predictors and to summarize their predictive ability, thus assisting clinical decision making and single nucleotide polymorphism detection in an era of increasing WGS data generation

    VivaxGEN: An open access platform for comparative analysis of short tandem repeat genotyping data in Plasmodium vivax populations.

    Get PDF
    BACKGROUND: The control and elimination of Plasmodium vivax will require a better understanding of its transmission dynamics, through the application of genotyping and population genetics analyses. This paper describes VivaxGEN (http://vivaxgen.menzies.edu.au), a web-based platform that has been developed to support P. vivax short tandem repeat data sharing and comparative analyses. RESULTS: The VivaxGEN platform provides a repository for raw data generated by capillary electrophoresis (FSA files), with fragment analysis and standardized allele calling tools. The query system of the platform enables users to filter, select and differentiate samples and alleles based on their specified criteria. Key population genetic analyses are supported including measures of population differentiation (FST), expected heterozygosity (HE), linkage disequilibrium (IAS), neighbor-joining analysis and Principal Coordinate Analysis. Datasets can also be formatted and exported for application in commonly used population genetic software including GENEPOP, Arlequin and STRUCTURE. To date, data from 10 countries, including 5 publicly available data sets have been shared with VivaxGEN. CONCLUSIONS: VivaxGEN is well placed to facilitate regional overviews of P. vivax transmission dynamics in different endemic settings and capable to be adapted for similar genetic studies of P. falciparum and other organisms

    An integrated whole genome analysis of Mycobacterium tuberculosis reveals insights into relationship between its genome, transcriptome and methylome.

    Get PDF
    Human tuberculosis disease (TB), caused by Mycobacterium tuberculosis (Mtb), is a complex disease, with a spectrum of outcomes. Genomic, transcriptomic and methylation studies have revealed differences between Mtb lineages, likely to impact on transmission, virulence and drug resistance. However, so far no studies have integrated sequence-based genomic, transcriptomic and methylation characterisation across a common set of samples, which is critical to understand how DNA sequence and methylation affect RNA expression and, ultimately, Mtb pathogenesis. Here we perform such an integrated analysis across 22 M. tuberculosis clinical isolates, representing ancient (lineage 1) and modern (lineages 2 and 4) strains. The results confirm the presence of lineage-specific differential gene expression, linked to specific SNP-based expression quantitative trait loci: with 10 eQTLs involving SNPs in promoter regions or transcriptional start sites; and 12 involving potential functional impairment of transcriptional regulators. Methylation status was also found to have a role in transcription, with evidence of differential expression in 50 genes across lineage 4 samples. Lack of methylation was associated with three novel variants in mamA, likely to cause loss of function of this enzyme. Overall, our work shows the relationship of DNA sequence and methylation to RNA expression, and differences between ancient and modern lineages. Further studies are needed to verify the functional consequences of the identified mechanisms of gene expression regulation

    Mammalian Sperm Head Formation Involves Different Polarization of Two Novel LINC Complexes

    Get PDF
    Background: LINC complexes are nuclear envelope bridging protein structures formed by interaction of SUN and KASH proteins. They physically connect the nucleus with the peripheral cytoskeleton and are critically involved in a variety of dynamic processes, such as nuclear anchorage, movement and positioning and meiotic chromosome dynamics. Moreover, they are shown to be essential for maintaining nuclear shape. Findings: Based on detailed expression analysis and biochemical approaches, we show here that during mouse sperm development, a terminal cell differentiation process characterized by profound morphogenic restructuring, two novel distinctive LINC complexes are established. They consist either of spermiogenesis-specific Sun3 and Nesprin1 or Sun1g, a novel non-nuclear Sun1 isoform, and Nesprin3. We could find that these two LINC complexes specifically polarize to opposite spermatid poles likely linking to sperm-specific cytoskeletal structures. Although, as shown in co-transfection/ immunoprecipitation experiments, SUN proteins appear to arbitrarily interact with various KASH partners, our study demonstrates that they actually are able to confine their binding to form distinct LINC complexes. Conclusions: Formation of the mammalian sperm head involves assembly and different polarization of two novel spermiogenesis-specific LINC complexes. Together, our findings suggest that theses LINC complexes connect the differentiating spermatid nucleus to surrounding cytoskeletal structures to enable its well-directed shaping and elongation

    Silencing of Vlaro2 for chorismate synthase revealed that the phytopathogen Verticillium longisporum induces the cross-pathway control in the xylem

    Get PDF
    The first leaky auxotrophic mutant for aromatic amino acids of the near-diploid fungal plant pathogen Verticillium longisporum (VL) has been generated. VL enters its host Brassica napus through the roots and colonizes the xylem vessels. The xylem contains little nutrients including low concentrations of amino acids. We isolated the gene Vlaro2 encoding chorismate synthase by complementation of the corresponding yeast mutant strain. Chorismate synthase produces the first branch point intermediate of aromatic amino acid biosynthesis. A novel RNA-mediated gene silencing method reduced gene expression of both isogenes by 80% and resulted in a bradytrophic mutant, which is a leaky auxotroph due to impaired expression of chorismate synthase. In contrast to the wild type, silencing resulted in increased expression of the cross-pathway regulatory gene VlcpcA (similar to cpcA/GCN4) during saprotrophic life. The mutant fungus is still able to infect the host plant B. napus and the model Arabidopsis thaliana with reduced efficiency. VlcpcA expression is increased in planta in the mutant and the wild-type fungus. We assume that xylem colonization requires induction of the cross-pathway control, presumably because the fungus has to overcome imbalanced amino acid supply in the xylem
    corecore