    Improving fold resistance prediction of HIV-1 against protease and reverse transcriptase inhibitors using artificial neural networks:

    Drug resistance in HIV treatment is still a worldwide problem. Predicting resistance to antiretrovirals (ARVs) before starting any treatment is important. Prediction accuracy is essential, as low-accuracy predictions increase the risk of prescribing sub-optimal drug regimens leading to patients developing resistance sooner. Artificial Neural Networks (ANNs) are a powerful tool that would be able to assist in drug resistance prediction. In this study, we constrained the dataset to subtype B, sacrificing generalizability for a higher predictive performance, and demonstrated that the predictive quality of the ANN regression models have definite improvement for most ARVs

    Drug resistance mutations in HIV:new bioinformatics approaches and challenges

    International audienceDrug resistance mutations appear in HIV under treatment pressure. Resistant variants can be transmitted to treatmentnaive individuals, which can lead to rapid virological failure and can limit treatment options. Consequently, quantifying the prevalence, emergence and transmission of drug resistance is critical to effectively treating patients and to shape health policies. We review recent bioinformatics developments and in particular describe: (1) the machine learning approaches intended to predict and explain the level of resistance of HIV variants from their sequence data; (2) the phylogenetic methods used to survey the emergence and dynamics of resistant HIV transmission clusters; (3) the impact of deep sequencing in studying within-host and between-host genetic diversity of HIV variants, notably regarding minority resistant variants

    Computational approaches for improving treatment and prevention of viral infections

    The treatment of infections with HIV or HCV is challenging. Thus, novel drugs and new computational approaches that support the selection of therapies are required. This work presents methods that support therapy selection as well as methods that advance novel antiviral treatments. geno2pheno[ngs-freq] identifies drug resistance from HIV-1 or HCV samples that were subjected to next-generation sequencing by interpreting their sequences either via support vector machines or a rules-based approach. geno2pheno[coreceptor-hiv2] determines the coreceptor that is used for viral cell entry by analyzing a segment of the HIV-2 surface protein with a support vector machine. openPrimeR is capable of finding optimal combinations of primers for multiplex polymerase chain reaction by solving a set cover problem and accessing a new logistic regression model for determining amplification events arising from polymerase chain reaction. geno2pheno[ngs-freq] and geno2pheno[coreceptor-hiv2] enable the personalization of antiviral treatments and support clinical decision making. The application of openPrimeR on human immunoglobulin sequences has resulted in novel primer sets that improve the isolation of broadly neutralizing antibodies against HIV-1. The methods that were developed in this work thus constitute important contributions towards improving the prevention and treatment of viral infectious diseases.Die Behandlung von HIV- oder HCV-Infektionen ist herausfordernd. Daher werden neue Wirkstoffe, sowie neue computerbasierte Verfahren benötigt, welche die Therapie verbessern. In dieser Arbeit wurden Methoden zur Unterstützung der Therapieauswahl entwickelt, aber auch solche, welche neuartige Therapien vorantreiben. geno2pheno[ngs-freq] bestimmt, ob Resistenzen gegen Medikamente vorliegen, indem es Hochdurchsatzsequenzierungsdaten von HIV-1 oder HCV Proben mittels Support Vector Machines oder einem regelbasierten Ansatz interpretiert. geno2pheno[coreceptor-hiv2] bestimmt den HIV-2 Korezeptorgebrauch dadurch, dass es einen Abschnitt des viralen Oberflächenproteins mit einer Support Vector Machine analysiert. openPrimeR kann optimale Kombinationen von Primern für die Multiplex-Polymerasekettenreaktion finden, indem es ein Mengenüberdeckungsproblem löst und auf ein neues logistisches Regressionsmodell für die Vorhersage von Amplifizierungsereignissen zurückgreift. geno2pheno[ngs-freq] und geno2pheno[coreceptor-hiv2] ermöglichen die Personalisierung antiviraler Therapien und unterstützen die klinische Entscheidungsfindung. Durch den Einsatz von openPrimeR auf humanen Immunoglobulinsequenzen konnten Primersätze generiert werden, welche die Isolierung von breit neutralisierenden Antikörpern gegen HIV-1 verbessern. Die in dieser Arbeit entwickelten Methoden leisten somit einen wichtigen Beitrag zur Verbesserung der Prävention und Therapie viraler Infektionskrankheiten

    HIV drug resistance prediction with weighted categorical kernel functions

    Background: Antiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance. Results: We analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs. Conclusions: Results show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at https://bitbucket.org/elies_ramon/catkern.Peer ReviewedPostprint (published version

    Application of machine learning, molecular modelling and structural data mining against antiretroviral drug resistance in HIV-1

    Millions are affected with the Human Immunodeficiency Virus (HIV) world wide, even though the death toll is on the decline. Antiretrovirals (ARVs), more specifically protease inhibitors have shown tremendous success since their introduction into therapy since the mid 1990’s by slowing down progression to the Acquired Immune Deficiency Syndrome (AIDS). However, Drug Resistance Mutations (DRMs) are constantly selected for due to viral adaptation, making drugs less effective over time. The current challenge is to manage the infection optimally with a limited set of drugs, with differing associated levels of toxicities in the face of a virus that (1) exists as a quasispecies, (2) may transmit acquired DRMs to drug-naive individuals and (3) that can manifest class-wide resistance due to similarities in design. The presence of latent reservoirs, unawareness of infection status, education and various socio-economic factors make the problem even more complex. Adequate timing and choice of drug prescription together with treatment adherence are very important as drug toxicities, drug failure and sub-optimal treatment regimens leave room for further development of drug resistance. While CD4 cell count and the determination of viral load from patients in resource-limited settings are very helpful to track how well a patient’s immune system is able to keep the virus in check, they can be lengthy in determining whether an ARV is effective. Phenosense assay kits answer this problem using viruses engineered to contain the patient sequences and evaluating their growth in the presence of different ARVs, but this can be expensive and too involved for routine checks. As a cheaper and faster alternative, genotypic assays provide similar information from HIV pol sequences obtained from blood samples, inferring ARV efficacy on the basis of drug resistance mutation patterns. However, these are inherently complex and the various methods of in silico prediction, such as Geno2pheno, REGA and Stanford HIVdb do not always agree in every case, even though this gap decreases as the list of resistance mutations is updated. A major gap in HIV treatment is that the information used for predicting drug resistance is mainly computed from data containing an overwhelming majority of B subtype HIV, when these only comprise about 12% of the worldwide HIV infections. In addition to growing evidence that drug resistance is subtype-related, it is intuitive to hypothesize that as subtyping is a phylogenetic classification, the more divergent a subtype is from the strains used in training prediction models, the less their resistance profiles would correlate. For the aforementioned reasons, we used a multi-faceted approach to attack the virus in multiple ways. This research aimed to (1) improve resistance prediction methods by focusing solely on the available subtype, (2) mine structural information pertaining to resistance in order to find any exploitable weak points and increase knowledge of the mechanistic processes of drug resistance in HIV protease. Finally, (3) we screen for protease inhibitors amongst a database of natural compounds [the South African natural compound database (SANCDB)] to find molecules or molecular properties usable to come up with improved inhibition against the drug target. In this work, structural information was mined using the Anisotropic Network Model, Dynamics Cross-Correlation, Perturbation Response Scanning, residue contact network analysis and the radius of gyration. These methods failed to give any resistance-associated patterns in terms of natural movement, internal correlated motions, residue perturbation response, relational behaviour and global compaction respectively. Applications of drug docking, homology-modelling and energy minimization for generating features suitable for machine-learning were not very promising, and rather suggest that the value of binding energies by themselves from Vina may not be very reliable quantitatively. All these failures lead to a refinement that resulted in a highly sensitive statistically-guided network construction and analysis, which leads to key findings in the early dynamics associated with resistance across all PI drugs. The latter experiment unravelled a conserved lateral expansion motion occurring at the flap elbows, and an associated contraction that drives the base of the dimerization domain towards the catalytic site’s floor in the case of drug resistance. Interestingly, we found that despite the conserved movement, bond angles were degenerate. Alongside, 16 Artificial Neural Network models were optimised for HIV proteases and reverse transcriptase inhibitors, with performances on par with Stanford HIVdb. Finally, we prioritised 9 compounds with potential protease inhibitory activity using virtual screening and molecular dynamics (MD) to additionally suggest a promising modification to one of the compounds. This yielded another molecule inhibiting equally well both opened and closed receptor target conformations, whereby each of the compounds had been selected against an array of multi-drug-resistant receptor variants. While a main hurdle was a lack of non-B subtype data, our findings, especially from the statistically-guided network analysis, may extrapolate to a certain extent to them as the level of conservation was very high within subtype B, despite all the present variations. This network construction method lays down a sensitive approach for analysing a pair of alternate phenotypes for which complex patterns prevail, given a sufficient number of experimental units. During the course of research a weighted contact mapping tool was developed to compare renin-angiotensinogen variants and packaged as part of the MD-TASK tool suite. Finally the functionality, compatibility and performance of the MODE-TASK tool were evaluated and confirmed for both Python2.7.x and Python3.x, for the analysis of normals modes from single protein structures and essential modes from MD trajectories. These techniques and tools collectively add onto the conventional means of MD analysis

    AI-based multi-PRS models outperform classical single-PRS models

    Polygenic risk scores (PRS) calculate the risk for a specific disease based on the weighted sum of associated alleles from different genetic loci in the germline estimated by regression models. Recent advances in genetics made it possible to create polygenic predictors of complex human traits, including risks for many important complex diseases, such as cancer, diabetes, or cardiovascular diseases, typically influenced by many genetic variants, each of which has a negligible effect on overall risk. In the current study, we analyzed whether adding additional PRS from other diseases to the prediction models and replacing the regressions with machine learning models can improve overall predictive performance. Results showed that multi-PRS models outperform single-PRS models significantly on different diseases. Moreover, replacing regression models with machine learning models, i.e., deep learning, can also improve overall accuracy

    The differential influence of HIV-1 subtype C,nucleoside analog resistance mutations: K65R, A62V, S68N and Y115F susceptibility to tenofovir.

    Masters Degree. University of KwaZulu-Natal, Durban.The use of Tenofovir Disoproxil Fumerate (TDF) for the treatment of HIV-1 infection has been recommended for the first-line as well as a second-line antiretroviral regimen in South Africa, due to its high antiretroviral activity and low toxicity level. However, the efficacy of the drug could be threatened by the emergence of drug resistance mutations. The development of TDF resistance poses a public health threat. TDF resistance can be acquired through a selection of the K65R mutation or the K70E mutation (though less frequently) under TDF selection pressure. Besides, K65R and K70E mutations, recent studies have identified other mutations associated with TDF resistance such as A62V, K65N, S68G/N/D, K70E/Q/T, L74I, V75L, and Y115F. These mutations were particularly observed to be in association with the K65R mutation and were reported to be more common in HIV-1 subtype C viruses. Also, these mutations could cause high-level resistance to TDF, especially when in combination with K65R. However, in-vitro studies are required to demonstrate their influence on viral fitness and TDF susceptibility. In this study, we investigated the impact of K65R, A62V, S68D, Y115F, and K65R+S68N on replication capacity and TDF susceptibility. The reverse transcriptase (RT) region was amplified from a drug-naive HIV-1 subtype C isolate obtained from a patient enrolled in the Tropism study (BREC: BF088/07) and cloned into a TOPO vector using a TOPO TA cloning kit. The HIV-1 RT mutations (K65R, A62V, S68D, Y115F, K65R+A62V, K65R+S68D, K65R+S68G, K65R+S68N, and K65R+Y115F) were introduced into the TOPO+RTsubC recombinant using the Quikchange lightning Multi site-directed mutagenesis kit. Next, recombinant viruses were created by co-transfection of the mutant RT amplicons and a pNL4-3-deleted-reverse transcriptase (RT) (pNL43ΔRT) backbone into GXR cells by electroporation. The replication capacity of the mutant viruses was assessed using a replication method that utilized a green fluorescent protein (GFP) reporter cell line and flow cytometry. We evaluated the replication capacity using the exponential growth curve function in Excel to determine the percentage GFP-expressing cells between days 2 and 6. The impact of the mutant viruses on susceptibility to TDF was performed in a luciferase-based assay. The 50% inhibitory concentration (IC50) was calculated using Graph Pad Prism. Drug susceptibility was expressed as the fold change in IC50 of mutant virus compared with the wild type virus. Of the 5 TDF- selected mutants analysed: A62V, K65R, and Y115F mutants display a reduction in replicative fitness whereas, S68D and K65R+S68N showed high viral fitness. Interestingly, the TDF- selected resistance mutations we analysed, showed high susceptibility (A62V, S68D, and Y115F) and reduced susceptibility (K65R and K65R+S68N) to TDF. Our findings support the hypothesis that TDF- selected mutations only confer reduced susceptibility to TDF. Hence, further study is needed on various combinations of TDF-selected resistance mutations to further solidify this claim.Ethical Approval for thesis is on page iv

    Going viral : an integrated view on virological data analysis from basic research to clinical applications

    Viruses are of considerable interest for several fields of life science research. The genomic richness of these entities, their environmen- tal abundance, as well as their high adaptability and, potentially, pathogenicity make treatment of viral diseases challenging. This thesis proposes three novel contributions to antiviral research that each concern analysis procedures of high-throughput experimen- tal genomics data. First, a sensitive approach for detecting viral genomes and transcripts in sequencing data of human cancers is presented that improves upon prior approaches by allowing de- tection of viral nucleotide sequences that consist of human-viral homologs or are diverged from known reference sequences. Sec- ond, a computational method for inferring physical protein contacts from experimental protein complex purification assays is put for- ward that allows statistically meaningful integration of multiple data sets and is able to infer protein contacts of transiently binding protein classes such as kinases and molecular chaperones. Third, an investigation of minute changes in viral genomic populations upon treatment of patients with the mutagen ribavirin is presented that first characterizes the mutagenic effect of this drug on the hepatitis C virus based on deep sequencing data.Viren sind von beträchtlichem Interesse für die biowissenschaftliche Forschung. Der genetische Reichtum, die hohe Vielfalt, wie auch die Anpassungsfähigkeit und mögliche Pathogenität dieser Organismen erschwert die Behandlung von viralen Erkrankungen. Diese Promotionsschrift enthält drei neuartige Beiträge zur antiviralen Forschung welche die Analyse von experimentellen Hochdurchsatzdaten der Genomik betreffen: erstens, ein sensitiver Ansatz zur Entdeckung viraler Genome und Transkripte in Sequenzdaten humaner Karzinome, der die Identifikation von viralen Nukleotidsequenzen ermöglicht, die von Referenzgenomen ab- weichen oder homolog zu humanen Faktoren sind. Zweitens, eine computergestützte Methode um physische Proteinkontakte von experimentellen Proteinkomplex-Purifikationsdaten abzuleiten welche die statistische Integration von mehreren Datensätzen erlaubt um insbesondere Proteinkontakte von flüchtig interagierenden Proteinklassen wie etwa Kinasen und Chaperonen aus den Daten ableiten zu können. Drittens, eine Untersuchung von kleinsten Änderungen viraler Genompopulationen während der Behandlung von Patienten mit dem Mutagen ribavirin die zum ersten Mal die mutagene Wirkung dieses Medikaments auf das Hepatitis C Virus mittels Tiefensequenzdaten nachweist