88 research outputs found
Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification
Gini impurity PIs. (PDF 8 kb
DeepAMR for predicting co-occurrent resistance of Mycobacterium tuberculosis.
MOTIVATION: Resistance co-occurrence within first-line anti-tuberculosis (TB) drugs is a common phenomenon. Existing methods based on genetic data analysis of Mycobacterium tuberculosis (MTB) have been able to predict resistance of MTB to individual drugs, but have not considered the resistance co-occurrence and cannot capture latent structure of genomic data that corresponds to lineages. RESULTS: We used a large cohort of TB patients from 16 countries across six continents where whole-genome sequences for each isolate and associated phenotype to anti-TB drugs were obtained using drug susceptibility testing recommended by the World Health Organization. We then proposed an end-to-end multi-task model with deep denoising auto-encoder (DeepAMR) for multiple drug classification and developed DeepAMR_cluster, a clustering variant based on DeepAMR, for learning clusters in latent space of the data. The results showed that DeepAMR outperformed baseline model and four machine learning models with mean AUROC from 94.4% to 98.7% for predicting resistance to four first-line drugs [i.e. isoniazid (INH), ethambutol (EMB), rifampicin (RIF), pyrazinamide (PZA)], multi-drug resistant TB (MDR-TB) and pan-susceptible TB (PANS-TB: MTB that is susceptible to all four first-line anti-TB drugs). In the case of INH, EMB, PZA and MDR-TB, DeepAMR achieved its best mean sensitivity of 94.3%, 91.5%, 87.3% and 96.3%, respectively. While in the case of RIF and PANS-TB, it generated 94.2% and 92.2% sensitivity, which were lower than baseline model by 0.7% and 1.9%, respectively. t-SNE visualization shows that DeepAMR_cluster captures lineage-related clusters in the latent space. AVAILABILITY AND IMPLEMENTATION: The details of source code are provided at http://www.robots.ox.ac.uk/?davidc/code.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
HIV drug resistance prediction with weighted categorical kernel functions
Background:
Antiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance.
Results:
We analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs.
Conclusions:
Results show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at https://bitbucket.org/elies_ramon/catkern.Peer ReviewedPostprint (published version
Explainable deep learning approach for multilabel classification of antimicrobial resistance with missing labels
Predicting Antimicrobial Resistance (AMR) from genomic sequence data has become a significant component of overcoming the AMR challenge, especially given its potential for facilitating more rapid diagnostics and personalised antibiotic treatments. With the recent advances in sequencing technologies and computing power, deep learning models for genomic sequence data have been widely adopted to predict AMR more reliably and error-free. There are many different types of AMR; therefore, any practical AMR prediction system must be able to identify multiple AMRs present in a genomic sequence. Unfortunately, most genomic sequence datasets do not have all the labels marked, thereby making a deep learning modelling approach challenging owing to its reliance on labels for reliability and accuracy. This paper addresses this issue by presenting an effective deep learning solution, Mask-Loss 1D convolution neural network (ML-ConvNet), for AMR prediction on datasets with many missing labels. The core component of ML- ConvNet utilises a masked loss function that overcomes the effect of missing labels in predicting AMR. The proposed ML-ConvNet is demonstrated to outperform state-of-the-art methods in the literature by 10.5%, according to the F1 score. The proposed model’s performance is evaluated using different degrees of the missing label and is found to outperform the conventional approach by 76% in the F1 score when 86.68% of labels are missing. Furthermore, the ML-ConvNet was established with an explainable artificial intelligence (XAI) pipeline, thereby making it ideally suited for hospital and healthcare settings, where model interpretability is an essential requirement
An investigation of multi-label classification techniques for predicting HIV drug resistance in resource-limited settings.
M. Sc. University of KwaZulu-Natal, Durban 2014.South Africa has one of the highest HIV infection rates in the world with more than 5.6 million infected
people and consequently has the largest antiretroviral treatment program with more than 1.5 million people
on treatment. The development of drug resistance is a major factor impeding the efficacy of antiretroviral
treatment. While genotype resistance testing (GRT) is the standard method to determine resistance, access
to these tests is limited in resource-limited settings. This research investigates the efficacy of multi-label
machine learning techniques at predicting HIV drug resistance from routine treatment and laboratory data.
Six techniques, namely, binary relevance, HOMER, MLkNN, predictive clustering trees (PCT), RAkEL and
ensemble of classifier chains (ECC) have been tested and evaluated on data from medical records of patients
enrolled in an HIV treatment failure clinic in rural KwaZulu-Natal in South Africa. The performance is
measured using five scalar evaluation measures and receiver operating characteristic (ROC) curves. The
techniques were found to provide useful predictive information in most cases. The PCT and ECC techniques
perform best and have true positive prediction rates of 97% and 98% respectively for specific drugs. The
ECC method also achieved an AUC value of 0:83, which is comparable to the current state of the art. All
models have been validated using 10 fold cross validation and show increased performance when additional
data is added. In order to make use of these techniques in the field, a tool is presented that may, with small
modifications, be integrated into public HIV treatment programs in South Africa and could assist clinicians
to identify patients with a high probability of drug resistance
- …