Search CORE

949 research outputs found

Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data

Author: Clifton DA
Crook DW
Iqbal Z
Niehaus KE
Peto TEA
Smith EG
Walker AS
Walker TM
Wilson DJ
Yang Y
Zhu T
Publication venue: 'Oxford University Press (OUP)'
Publication date: 12/12/2017
Field of study

Motivation: Correct and rapid determination of Mycobacterium tuberculosis (MTB) resistance against available tuberculosis (TB) drugs is essential for the control and management of TB. Conventional molecular diagnostic test assumes that the presence of any well-studied single nucleotide polymorphisms is sufficient to cause resistance, which yields low sensitivity for resistance classification. Methods: Given the availability of DNA sequencing data from MTB, we developed machine learning models for a cohort of 1839 UK bacterial isolates to classify MTB resistance against eight anti-TB drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, ciprofloxacin, moxifloxacin, ofloxacin, streptomycin) and to classify multi-drug resistance. Results: Compared to previous rules-based approach, the sensitivities from the best-performing models increased by 2-4% for isoniazid, rifampicin and ethambutol to 97% (p<0.01), respectively; for ciprofloxacin and multi-drug resistant TB, they increased to 96%. For moxifloxacin and ofloxacin, sensitivities increased by 12% and 15% from 83% and 81% based on existing known resistance alleles to 95% and 96% (p<0.01), respectively. Particularly, our models improved sensitivities compared to the previous rules-based approach by 15% and 24% to 84% and 87% for pyrazinamide and streptomycin (p<0.01), respectively. The best-performing models increase the area-under-the-ROC curve by 10% for pyrazinamide and streptomycin (p<0.01), and 4-8% for other drugs (p<0.01). Availability: The details of source code are provided at http://www.robots.ox.ac.uk/davidc/code.ph

Oxford University Research Archive

Machine learning predicts accurately mycobacterium tuberculosis drug resistance from whole genome sequencing data

Author: Benavente ED
Campino S
Christakoudi S
Clark TG
Deelder W
McNerney R
Palla L
Phelan J
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

Background: Tuberculosis disease, caused by Mycobacterium tuberculosis, is a major public health problem. The emergence of M. tuberculosis strains resistant to existing treatments threatens to derail control efforts. Resistance is mainly conferred by mutations in genes coding for drug targets or converting enzymes, but our knowledge of these mutations is incomplete. Whole genome sequencing (WGS) is an increasingly common approach to rapidly characterize isolates and identify mutations predicting antimicrobial resistance and thereby providing a diagnostic tool to assist clinical decision making. Methods: We applied machine learning approaches to 16,688 M. tuberculosis isolates that have undergone WGS and laboratory drug-susceptibility testing (DST) across 14 antituberculosis drugs, with 22.5% of samples being multidrug resistant and 2.1% being extensively drug resistant. We used non-parametric classification-tree and gradientboosted-tree models to predict drug resistance and uncover any associated novel putative mutations. We fitted separate models for each drug, with and without “co-occurrent resistance” markers known to be causing resistance to drugs other than the one of interest. Predictive performance was measured using sensitivity, specificity, and the area under the receiver operating characteristic curve, assuming DST results as the gold standard. Results: The predictive performance was highest for resistance to first-line drugs, amikacin, kanamycin, ciprofloxacin, moxifloxacin, and multidrug-resistant tuberculosis (area under the receiver operating characteristic curve above 96%), and lowest for thirdline drugs such as D-cycloserine and Para-aminosalisylic acid (area under the curve below 85%). The inclusion of co-occurrent resistance markers led to improved performance for some drugs and superior results when compared to similar models in other largescale studies, which had smaller sample sizes. Overall, the gradient-boosted-tree models performed better than the classification-tree models. The mutation-rank analysis detected no new single nucleotide polymorphisms linked to drug resistance. Discordance between DST and genotypically inferred resistance may be explained by DST errors, novel rare mutations, hetero-resistance, and nongenomic drivers such as efflux-pump upregulation. Conclusion: Our work demonstrates the utility of machine learning as a flexible approach to drug resistance prediction that is able to accommodate a much larger number of predictors and to summarize their predictive ability, thus assisting clinical decision making and single nucleotide polymorphism detection in an era of increasing WGS data generation

Archivio della ricerca- Università di Roma La Sapienza

Tuberculosis Prediction by Machine Learning Techniques

Author: Kuldeep Godiyal
Surabhi Pokhriyal
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 09/12/2022
Field of study

Tuberculosis is one of the top reasons of death all over the planet. Mycobacterium tuberculosis, bacteria that infects the lungs, is what causes it. For professionals working in the medical field, accurately identifying and timely predicting tuberculosis are major challenges. The course of treatment also varies from patient to patient since occasionally a patient develops drug resistance. Doctors will be given algorithmic support while using machine learning to help them diagnose, treat patients appropriately, and make quicker and better judgments. This paper discusses the many tuberculosis causes and symptoms as well as how accurate and fast prediction and diagnostic investigations have been carried out in recent years with the aid of machine learning (ML) technique

International Journal on Recent and Innovation Trends in Computing and Communication

Machine learning and applications in microbiology

Author: Barratt JLN
Calarco L
Ellis JT
Goodswen SJ
Kaufer A
Kennedy PJ
Publication venue: 'Oxford University Press (OUP)'
Publication date: 04/06/2021
Field of study

To understand the intricacies of microorganisms at the molecular level requires making sense of copious volumes of data such that it may now be humanly impossible to detect insightful data patterns without an artificial intelligence application called machine learning. Applying machine learning to address biological problems is expected to grow at an unprecedented rate, yet it is perceived by the uninitiated as a mysterious and daunting entity entrusted to the domain of mathematicians and computer scientists. The aim of this review is to identify key points required to start the journey of becoming an effective machine learning practitioner. These key points are further reinforced with an evaluation of how machine learning has been applied so far in a broad scope of real-life microbiology examples. This includes predicting drug targets or vaccine candidates, diagnosing microorganisms causing infectious diseases, classifying drug resistance against antimicrobial medicines, predicting disease outbreaks and exploring microbial interactions. Our hope is to inspire microbiologists and other related researchers to join the emerging machine learning revolution

OPUS - University of Technology Sydney

Application of machine learning techniques to tuberculosis drug resistance analysis

Author: Clifton DA
Crook DW
Cryptic Consortium
Kouchaki S
Peto TEA
Walker AS
Walker TM
Wilson DJ
Yang Y
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Timely identification of Mycobacterium tuberculosis (MTB) resistance to existing drugs is vital to decrease mortality and prevent the amplification of existing antibiotic resistance. Machine learning methods have been widely applied for timely predicting resistance of MTB given a specific drug and identifying resistance markers. However, they have been not validated on a large cohort of MTB samples from multi-centers across the world in terms of resistance prediction and resistance marker identification. Several machine learning classifiers and linear dimension reduction techniques were developed and compared for a cohort of 13 402 isolates collected from 16 countries across 6 continents and tested 11 drugs. Results Compared to conventional molecular diagnostic test, area under curve of the best machine learning classifier increased for all drugs especially by 23.11%, 15.22% and 10.14% for pyrazinamide, ciprofloxacin and ofloxacin, respectively (P < 0.01). Logistic regression and gradient tree boosting found to perform better than other techniques. Moreover, logistic regression/gradient tree boosting with a sparse principal component analysis/non-negative matrix factorization step compared with the classifier alone enhanced the best performance in terms of F1-score by 12.54%, 4.61%, 7.45% and 9.58% for amikacin, moxifloxacin, ofloxacin and capreomycin, respectively, as well increasing area under curve for amikacin and capreomycin. Results provided a comprehensive comparison of various techniques and confirmed the application of machine learning for better prediction of the large diverse tuberculosis data. Furthermore, mutation ranking showed the possibility of finding new resistance/susceptible markers. Availability and implementation The source code can be found at http://www.robots.ox.ac.uk/ davidc/code.php Supplementary information Supplementary data are available at Bioinformatics online. </jats:sec

Oxford University Research Archive

Spiral - Imperial College Digital Repository

A modified decision tree approach to improve the prediction and mutation discovery for drug resistance in Mycobacterium tuberculosis.

Author: Campino Susana
Clark Taane G
Deelder Wouter
Napier Gary
Palla Luigi
Phelan Jody
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

BACKGROUND: Drug resistant Mycobacterium tuberculosis is complicating the effective treatment and control of tuberculosis disease (TB). With the adoption of whole genome sequencing as a diagnostic tool, machine learning approaches are being employed to predict M. tuberculosis resistance and identify underlying genetic mutations. However, machine learning approaches can overfit and fail to identify causal mutations if they are applied out of the box and not adapted to the disease-specific context. We introduce a machine learning approach that is customized to the TB setting, which extracts a library of genomic variants re-occurring across individual studies to improve genotypic profiling. RESULTS: We developed a customized decision tree approach, called Treesist-TB, that performs TB drug resistance prediction by extracting and evaluating genomic variants across multiple studies. The application of Treesist-TB to rifampicin (RIF), isoniazid (INH) and ethambutol (EMB) drugs, for which resistance mutations are known, demonstrated a level of predictive accuracy similar to the widely used TB-Profiler tool (Treesist-TB vs. TB-Profiler tool: RIF 97.5% vs. 97.6%; INH 96.8% vs. 96.5%; EMB 96.8% vs. 95.8%). Application of Treesist-TB to less understood second-line drugs of interest, ethionamide (ETH), cycloserine (CYS) and para-aminosalisylic acid (PAS), led to the identification of new variants (52, 6 and 11, respectively), with a high number absent from the TB-Profiler library (45, 4, and 6, respectively). Thereby, Treesist-TB had improved predictive sensitivity (Treesist-TB vs. TB-Profiler tool: PAS 64.3% vs. 38.8%; CYS 45.3% vs. 30.7%; ETH 72.1% vs. 71.1%). CONCLUSION: Our work reinforces the utility of machine learning for drug resistance prediction, while highlighting the need to customize approaches to the disease-specific context. Through applying a modified decision learning approach (Treesist-TB) across a range of anti-TB drugs, we identified plausible resistance-encoding genomic variants with high predictive ability, whilst potentially overcoming the overfitting challenges that can affect standard machine learning applications

Archivio della ricerca- Università di Roma La Sapienza

Computational de-orphaning of Mycobacterium tuberculosis targets

Author: Bishi Lorraine Yamurai
Blundell Tom L
Mugumbate Grace Chitima
Vedithi SC
Publication venue: Tuberculosis - Beyond the Biomedical
Publication date: 01/02/2019
Field of study

Tuberculosis (TB) continues to be a major health hazard worldwide due to the resurgence of drug discovery strains of Mycobacterium tuberculosis (Mtb) and co-infection. For decades drug discovery has concentrated on identifying ligands for ~10 Mtb targets, hence most of the identified essential proteins are not utilised in TB chemotherapy. Here computational techniques were used to identify ligands for the orphan Mtb proteins. These range from ligand-based and structure-based virtual screening modelling the proteome of the bacterium. Identification of ligands for most of the Mtb Proteins will provide novel TB drugs and targets and hence address drug resistance, toxicity and the duration of TB treatment

Feature Weighted Models (FWM) to address lineage dependency in drug-resistance prediction from Mycobacterium tuberculosis genome sequences.

Author: Billows Nina
Chang Yu-Mei
Clark Taane G
Peng Yonghong
Phelan Jody E
Xia Dong
Publication venue: Oxford University Press
Publication date: 10/07/2023
Field of study

MOTIVATION: Tuberculosis (TB) is caused by members of the Mycobacterium tuberculosis complex (MTBC), which has a strain- or lineage-based clonal population structure. The evolution of drug-resistance in the MTBC poses a threat to successful treatment and eradication of TB. Machine learning approaches are being increasingly adopted to predict drug-resistance and characterise underlying mutations from whole genome sequences. However, such approaches may not generalise well in clinical practice due to confounding from the population structure of the MTBC. RESULTS: To investigate how population structure affects machine learning prediction, we compared three different approaches to reduce lineage dependency in random forest (RF) models, including stratification, feature selection and feature weighted models. All RF models achieved moderate-high performance (AUC-ROC range: 0.60-0.98). First-line drugs had higher performance than second-line drugs, but it varied depending on the lineages in the training dataset. Lineage-specific models generally had higher sensitivity than global models which may be underpinned by strain-specific drug-resistance mutations or sampling effects. The application of feature weights and feature selection approaches reduced lineage dependency in the model and had comparable performance to unweighted RF models. AVAILABILITY AND IMPLEMENTATION: https://github.com/NinaMercedes/RF_lineages. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online