69 research outputs found

    HIV Drug Resistant Prediction and Featured Mutants Selection using Machine Learning Approaches

    Get PDF
    HIV/AIDS is widely spread and ranks as the sixth biggest killer all over the world. Moreover, due to the rapid replication rate and the lack of proofreading mechanism of HIV virus, drug resistance is commonly found and is one of the reasons causing the failure of the treatment. Even though the drug resistance tests are provided to the patients and help choose more efficient drugs, such experiments may take up to two weeks to finish and are expensive. Because of the fast development of the computer, drug resistance prediction using machine learning is feasible. In order to accurately predict the HIV drug resistance, two main tasks need to be solved: how to encode the protein structure, extracting the more useful information and feeding it into the machine learning tools; and which kinds of machine learning tools to choose. In our research, we first proposed a new protein encoding algorithm, which could convert various sizes of proteins into a fixed size vector. This algorithm enables feeding the protein structure information to most state of the art machine learning algorithms. In the next step, we also proposed a new classification algorithm based on sparse representation. Following that, mean shift and quantile regression were included to help extract the feature information from the data. Our results show that encoding protein structure using our newly proposed method is very efficient, and has consistently higher accuracy regardless of type of machine learning tools. Furthermore, our new classification algorithm based on sparse representation is the first application of sparse representation performed on biological data, and the result is comparable to other state of the art classification algorithms, for example ANN, SVM and multiple regression. Following that, the mean shift and quantile regression provided us with the potentially most important drug resistant mutants, and such results might help biologists/chemists to determine which mutants are the most representative candidates for further research

    Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data

    Get PDF

    Computational approaches for improving treatment and prevention of viral infections

    Get PDF
    The treatment of infections with HIV or HCV is challenging. Thus, novel drugs and new computational approaches that support the selection of therapies are required. This work presents methods that support therapy selection as well as methods that advance novel antiviral treatments. geno2pheno[ngs-freq] identifies drug resistance from HIV-1 or HCV samples that were subjected to next-generation sequencing by interpreting their sequences either via support vector machines or a rules-based approach. geno2pheno[coreceptor-hiv2] determines the coreceptor that is used for viral cell entry by analyzing a segment of the HIV-2 surface protein with a support vector machine. openPrimeR is capable of finding optimal combinations of primers for multiplex polymerase chain reaction by solving a set cover problem and accessing a new logistic regression model for determining amplification events arising from polymerase chain reaction. geno2pheno[ngs-freq] and geno2pheno[coreceptor-hiv2] enable the personalization of antiviral treatments and support clinical decision making. The application of openPrimeR on human immunoglobulin sequences has resulted in novel primer sets that improve the isolation of broadly neutralizing antibodies against HIV-1. The methods that were developed in this work thus constitute important contributions towards improving the prevention and treatment of viral infectious diseases.Die Behandlung von HIV- oder HCV-Infektionen ist herausfordernd. Daher werden neue Wirkstoffe, sowie neue computerbasierte Verfahren benötigt, welche die Therapie verbessern. In dieser Arbeit wurden Methoden zur Unterstützung der Therapieauswahl entwickelt, aber auch solche, welche neuartige Therapien vorantreiben. geno2pheno[ngs-freq] bestimmt, ob Resistenzen gegen Medikamente vorliegen, indem es Hochdurchsatzsequenzierungsdaten von HIV-1 oder HCV Proben mittels Support Vector Machines oder einem regelbasierten Ansatz interpretiert. geno2pheno[coreceptor-hiv2] bestimmt den HIV-2 Korezeptorgebrauch dadurch, dass es einen Abschnitt des viralen Oberflächenproteins mit einer Support Vector Machine analysiert. openPrimeR kann optimale Kombinationen von Primern für die Multiplex-Polymerasekettenreaktion finden, indem es ein Mengenüberdeckungsproblem löst und auf ein neues logistisches Regressionsmodell für die Vorhersage von Amplifizierungsereignissen zurückgreift. geno2pheno[ngs-freq] und geno2pheno[coreceptor-hiv2] ermöglichen die Personalisierung antiviraler Therapien und unterstützen die klinische Entscheidungsfindung. Durch den Einsatz von openPrimeR auf humanen Immunoglobulinsequenzen konnten Primersätze generiert werden, welche die Isolierung von breit neutralisierenden Antikörpern gegen HIV-1 verbessern. Die in dieser Arbeit entwickelten Methoden leisten somit einen wichtigen Beitrag zur Verbesserung der Prävention und Therapie viraler Infektionskrankheiten

    Prediksi Perkembangan Kondisi Pasien Terapi HIV dengan Menggunakan Representase ALE-index sebagai Invariant Nucleotida Sequence dan Support Vector Machine

    Get PDF
    Abstrak Human Immunodeficiency Virus atau disingkat HIV merupakan salah satu jenis virus yang sangat berbahaya. HIV menyerang system immune yang menyebabkan pasien HIV mengalami kegagalan sistem kekebalan tubuh. Dalam beberapa tahun terakhir, inveksi HIV sudah ditangani dengan berbagai terapi. Salah satu terapi paling efektif adalah dengan mengkonsumsi obat antiretroviral yang akan menekan virus HIV agar tidak menduplikasikan diri, ataupun menginfeksi sel darah putih. Namun, virus biasanya akan bermutasi terhadap obat obatan yang diberikan dalam penanganan, sehingga virus kebal terhadap obat yang biasa diberikan di terapi. Untuk itu dibutuhkan suatu sistem prediksi untuk memprediksi kondisi pasien terapi yang akan membaik, agar mempermudah dalam pengambilan keputusan penangan dini pada pasien. Dengan menggunakan 4 parameter yaitu jumlah CD4, Viral Load, PR sequence dan RT sequence, penulis berusaha membangun sistem prediksi perkembangan kondisi pasien terapi HIV. Sistem prediksi ini dibangun dengan salah satu metode klasifikasi machine learning yaitu metode Support Vector Machine (SVM) dan representasi numerik dari urutan nukleotida yaitu ALE-index. Metode ALE-index pada sistem berfungsi untuk mentranslasi parameter RT sequence dan PR sequence yang masih dalam bentuk urutan nukleotida menjadi data numerik agar bisa diinputkan kedalam SVM. Pada metode ALE-index ini juga terdapat beberapa penangan karakter yang bukan merupakan empat unsur utama penyusun urutan nukleotida. Hasil pengujian menunjukkan kombinasi penanganan Random-Delete row dengan menggunakan kernel RBF pada SVM memperoleh akurasi yang lebih tinggi dibandingkan kombinasi penanganan dan parameter lainnya sebesar 77.5%. Dan dengan menggunakan keempat parameter, akurasi yang diperoleh lebih tinggi dibandingkan dengan mengilangkan salah satu fitur.</p

    Statistical learning methods for bias-aware HIV therapy screening

    Get PDF
    The human immunodeficiency virus (HIV) is the causative agent of the acquired immunodeficiency syndrome (AIDS) which claimed nearly 30 million lives and is arguably among the worst plagues in human history. With no cure or vaccine in sight, HIV patients are treated by administration of combinations of antiretroviral drugs. The very large number of such combinations makes the manual search for an effective therapy practically impossible, especially in advanced stages of the disease. Therapy selection can be supported by statistical methods that predict the outcomes of candidate therapies. However, these methods are based on clinical data sets that are biased in many ways. The main sources of bias are the evolving trends of treating HIV patients, the sparse, uneven therapy representation, the different treatment backgrounds of the clinical samples and the differing abundances of the various therapy-experience levels. In this thesis we focus on the problem of devising bias-aware statistical learning methods for HIV therapy screening -- predicting the effectiveness of HIV combination therapies. For this purpose we develop five novel approaches that when predicting outcomes of HIV therapies address the aforementioned biases in the clinical data sets. Three of the approaches aim for good prediction performance for every drug combination independent of its abundance in the HIV clinical data set. To achieve this, they balance the sparse and uneven therapy representation by using different routes of sharing common knowledge among related therapies. The remaining two approaches additionally account for the bias originating from the differing treatment histories of the samples making up the HIV clinical data sets. For this purpose, both methods predict the response of an HIV combination therapy by taking not only the most recent (target) therapy but also available information from preceding therapies into account. In this way they provide good predictions for advanced patients in mid to late stages of HIV treatment, and for rare drug combinations. All our methods use the time-oriented evaluation scenario, where models are trained on data from the less recent past while their performance is evaluated on data from the more recent past. This is the approach we adopt to account for the evolving treatment trends in the HIV clinical practice and thus offer a realistic model assessment.Das Humane Immundefizienz-Virus (HIV) ist der Erreger des erworbenen Immundefektsyndroms (AIDS), das fast 30 Millionen Menschen das Leben gekostet hat und wohl als eine der schlimmsten Seuchen in der Geschichte der Menschheit gelten kann. Da in absehbarer Zeit keine Heilung oder Impfung gegen diese Krankheit zu erwarten ist, werden HIV-Patienten durch die Verabreichung von Kombinationen von anti-retroviralen Medikamenten behandelt. Die sehr große Zahl solcher Kombinationen macht die manuelle Suche nach einer effektiven Therapie vor allem in fortgeschrittenen Stadien der Erkrankung praktisch unmöglich. Dieser Prozess der Therapieauswahl kann mit Hilfe statistischer Verfahren unterstützt werden, welche die Ergebnisse der Therapie vorherzusagen versuchen. Allerdings beruhen diese Methoden auf klinischen Datensätzen die verschiedene Biases enthalten. Die wichtigsten Quellen für Bias sind die sich entwickelnden Trends in der Behandlung von HIV-Patienten, die sparse, ungleichmäßige Repräsentation der Therapien, die verschiedenen Behandlungshintergründe der klinischen Proben sowie die variablen Häufigkeiten der Therapieerfahrungen. In dieser Arbeit konzentrieren wir uns auf die Aufgabe, Bias-bewusste statistische Lernverfahren für das HIV-Therapie Screening zu konzipieren und die Effektivität von HIV-Kombinationstherapien vorherzusagen. Zu diesem Zweck entwickeln wir fünf neue Ansätze, welche die erwähnten Biases in klinischen Datensätzen bei der Vorhersage von HIV-Therapien berücksichtigen. Drei dieser Ansätze zielen auf eine gute Vorhersageleistung für jede Medikamentenkombination unabhängig von deren Frequenz in den klinischen Daten. Um dies zu erreichen versuchen die Ansätze die sparsen und ungleichmäßig verteilten Therapie-Repräsentationen auszugleichen, indem sie Informationen über verwandte Therapien auf verschiedene Weise ausnutzen. Die verbleibenden zwei Ansätze berücksichtigen zudem den Bias, der von den verschiedenen Behandlungshintergründen der Proben in den klinischen Datensätzen herrührt. Zu diesem Zweck sagen die Methoden das Therapie-Ansprechen für HIV-Kombinationstherapien auf eine Weise vorher, die nicht nur die direkt vorhergehende Therapie berücksichtigt sondern auch auch Informationen über andere, zeitlich früher gelegene Therapien mit einbezieht. Auf diese Weise bieten die vorgestellten Ansätze gute Vorhersagen für fortgeschrittene Patienten im mittleren bis späten Stadium der HIV-Behandlung sowie für seltene Medikamentenkombinationen. Alle unsere Methoden verwenden ein zeitorientiertes Evaluierungsszenario, in dem Modelle auf Daten aus der entfernteren Vergangenheit trainiert werden, während ihre Vorhersageleistung auf Daten aus der jüngeren Vergangenheit ausgewertet werden. Dieser Ansatz wurde gewählt, um die entwickelnden Trends in der klinischen HIV-Behandlung zu berücksichtigen und damit eine realistische Bewertung der vorgestellten Modelle zu ermöglichen

    Computational analysis of anti-HIV-1 antibody neutralization panel data to identify potential functional epitope residues

    Get PDF
    Advances in single-cell antibody cloning methods have led to the identification of a variety of broadly neutralizing anti–HIV-1 antibodies. We developed a computational tool (Antibody Database) to help identify critical residues on the HIV-1 envelope protein whose natural variation affects antibody activity. Our simplifying assumption was that, for a given antibody, a significant portion of the dispersion of neutralization activity across a panel of HIV-1 strains is due to the amino acid identity or glycosylation state at a small number of specific sites, each acting independently. A model of an antibody’s neutralization IC_(50) was developed in which each site contributes a term to the logarithm of the modeled IC_(50). The analysis program attempts to determine the set of rules that minimizes the sum of the residuals between observed and modeled IC_(50) values. The predictive quality of the identified rules may be assessed in part by whether there is support for rules within individual viral clades. As a test case, we analyzed antibody 8ANC195, an anti-glycoprotein gp120 antibody of unknown specificity. The model for this antibody indicated that several glycosylation sites were critical for neutralization. We evaluated this prediction by measuring neutralization potencies of 8ANC195 against HIV-1 in vitro and in an antibody therapy experiment in humanized mice. These experiments confirmed that 8ANC195 represents a distinct class of glycan-dependent anti–HIV-1 antibody and validated the utility of computational analysis of neutralization panel data

    Computational ligand design and analysis in protein complexes using inverse methods, combinatorial search, and accurate solvation modeling

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemistry, 2006.Vita.Includes bibliographical references (p. 207-230).This thesis presents the development and application of several computational techniques to aid in the design and analysis of small molecules and peptides that bind to protein targets. First, an inverse small-molecule design algorithm is presented that can explore the space of ligands compatible with binding to a target protein using fast combinatorial search methods. The inverse design method was applied to design inhibitors of HIV-1 protease that should be less likely to induce resistance mutations because they fit inside a consensus substrate envelope. Fifteen designed inhibitors were chemically synthesized, and four of the tightest binding compounds to the wild-type protease exhibited broad specificity against a panel of drug resistance mutant proteases in experimental tests. Inverse protein design methods and charge optimization were also applied to improve the binding affinity of a substrate peptide for an inactivated mutant of HIV-1 protease, in an effort to learn more about the thermodynamics and mechanisms of peptide binding. A single mutant peptide calculated to have improved binding electrostatics exhibited greater than 10-fold improved affinity experimentally.(cont.) The second half of this thesis presents an accurate method for evaluating the electrostatic component of solvation and binding in molecular systems, based on curved boundary-element method solutions of the linearized Poisson-Boltzmann equation. Using the presented FFTSVD matrix compression algorithm and other techniques, a full linearized Poisson-Boltzmann equation solver is described that is capable of solving multi-region problems in molecular continuum electrostatics to high precision.Michael Darren Altman.Ph.D

    Application of machine learning, molecular modelling and structural data mining against antiretroviral drug resistance in HIV-1

    Get PDF
    Millions are affected with the Human Immunodeficiency Virus (HIV) world wide, even though the death toll is on the decline. Antiretrovirals (ARVs), more specifically protease inhibitors have shown tremendous success since their introduction into therapy since the mid 1990’s by slowing down progression to the Acquired Immune Deficiency Syndrome (AIDS). However, Drug Resistance Mutations (DRMs) are constantly selected for due to viral adaptation, making drugs less effective over time. The current challenge is to manage the infection optimally with a limited set of drugs, with differing associated levels of toxicities in the face of a virus that (1) exists as a quasispecies, (2) may transmit acquired DRMs to drug-naive individuals and (3) that can manifest class-wide resistance due to similarities in design. The presence of latent reservoirs, unawareness of infection status, education and various socio-economic factors make the problem even more complex. Adequate timing and choice of drug prescription together with treatment adherence are very important as drug toxicities, drug failure and sub-optimal treatment regimens leave room for further development of drug resistance. While CD4 cell count and the determination of viral load from patients in resource-limited settings are very helpful to track how well a patient’s immune system is able to keep the virus in check, they can be lengthy in determining whether an ARV is effective. Phenosense assay kits answer this problem using viruses engineered to contain the patient sequences and evaluating their growth in the presence of different ARVs, but this can be expensive and too involved for routine checks. As a cheaper and faster alternative, genotypic assays provide similar information from HIV pol sequences obtained from blood samples, inferring ARV efficacy on the basis of drug resistance mutation patterns. However, these are inherently complex and the various methods of in silico prediction, such as Geno2pheno, REGA and Stanford HIVdb do not always agree in every case, even though this gap decreases as the list of resistance mutations is updated. A major gap in HIV treatment is that the information used for predicting drug resistance is mainly computed from data containing an overwhelming majority of B subtype HIV, when these only comprise about 12% of the worldwide HIV infections. In addition to growing evidence that drug resistance is subtype-related, it is intuitive to hypothesize that as subtyping is a phylogenetic classification, the more divergent a subtype is from the strains used in training prediction models, the less their resistance profiles would correlate. For the aforementioned reasons, we used a multi-faceted approach to attack the virus in multiple ways. This research aimed to (1) improve resistance prediction methods by focusing solely on the available subtype, (2) mine structural information pertaining to resistance in order to find any exploitable weak points and increase knowledge of the mechanistic processes of drug resistance in HIV protease. Finally, (3) we screen for protease inhibitors amongst a database of natural compounds [the South African natural compound database (SANCDB)] to find molecules or molecular properties usable to come up with improved inhibition against the drug target. In this work, structural information was mined using the Anisotropic Network Model, Dynamics Cross-Correlation, Perturbation Response Scanning, residue contact network analysis and the radius of gyration. These methods failed to give any resistance-associated patterns in terms of natural movement, internal correlated motions, residue perturbation response, relational behaviour and global compaction respectively. Applications of drug docking, homology-modelling and energy minimization for generating features suitable for machine-learning were not very promising, and rather suggest that the value of binding energies by themselves from Vina may not be very reliable quantitatively. All these failures lead to a refinement that resulted in a highly sensitive statistically-guided network construction and analysis, which leads to key findings in the early dynamics associated with resistance across all PI drugs. The latter experiment unravelled a conserved lateral expansion motion occurring at the flap elbows, and an associated contraction that drives the base of the dimerization domain towards the catalytic site’s floor in the case of drug resistance. Interestingly, we found that despite the conserved movement, bond angles were degenerate. Alongside, 16 Artificial Neural Network models were optimised for HIV proteases and reverse transcriptase inhibitors, with performances on par with Stanford HIVdb. Finally, we prioritised 9 compounds with potential protease inhibitory activity using virtual screening and molecular dynamics (MD) to additionally suggest a promising modification to one of the compounds. This yielded another molecule inhibiting equally well both opened and closed receptor target conformations, whereby each of the compounds had been selected against an array of multi-drug-resistant receptor variants. While a main hurdle was a lack of non-B subtype data, our findings, especially from the statistically-guided network analysis, may extrapolate to a certain extent to them as the level of conservation was very high within subtype B, despite all the present variations. This network construction method lays down a sensitive approach for analysing a pair of alternate phenotypes for which complex patterns prevail, given a sufficient number of experimental units. During the course of research a weighted contact mapping tool was developed to compare renin-angiotensinogen variants and packaged as part of the MD-TASK tool suite. Finally the functionality, compatibility and performance of the MODE-TASK tool were evaluated and confirmed for both Python2.7.x and Python3.x, for the analysis of normals modes from single protein structures and essential modes from MD trajectories. These techniques and tools collectively add onto the conventional means of MD analysis

    Bioinformatical approaches to ranking of anti-HIV combination therapies and planning of treatment schedules

    Get PDF
    The human immunodeficiency virus (HIV) pandemic is one of the most serious health challenges humanity is facing today. Combination therapy comprising multiple antiretroviral drugs resulted in a dramatic decline in HIV-related mortality in the developed countries. However, the emergence of drug resistant HIV variants during treatment remains a problem for permanent treatment success and seriously hampers the composition of new active regimens. In this thesis we use statistical learning for developing novel methods that rank combination therapies according to their chance of achieving treatment success. These depend on information regarding the treatment composition, the viral genotype, features of viral evolution, and the patient's therapy history. Moreover, we investigate different definitions of response to antiretroviral therapy and their impact on prediction performance of our method. We address the problem of extending purely data-driven approaches to support novel drugs with little available data. In addition, we explore the prospect of prediction systems that are centered on the patient's treatment history instead of the viral genotype. We present a framework for rapidly simulating resistance development during combination therapy that will eventually allow application of combination therapies in the best order. Finally, we analyze surface proteins of HIV regarding their susceptibility to neutralizing antibodies with the aim of supporting HIV vaccine development.Die Humane Immundefizienz-Virus (HIV) Pandemie ist eine der schwerwiegendsten gesundheitlichen Herausforderungen weltweit. Kombinationstherapien bestehend aus mehreren Medikamenten führten in entwickelten Ländern zu einem drastischen Rückgang der HIV-bedingten Sterblichkeit. Die Entstehung von Arzneimittel-resistenten Varianten während der Behandlung stellt allerdings ein Problem für den anhaltenden Behandlungserfolg dar und erschwert die Zusammenstellung von neuen aktiven Kombinationen. In dieser Arbeit verwenden wir statistisches Lernen zur Entwicklung neuer Methoden, welche Kombinationstherapien bezüglich ihres erwarteten Behandlungserfolgs sortieren. Dabei nutzen wir Informationen über die Medikamente, das virale Erbgut, die Virus Evolution und die Therapiegeschichte des Patienten. Außerdem untersuchen wir unterschiedliche Definitionen für Therapieerfolg und ihre Auswirkungen auf die Güte unserer Modelle. Wir gehen das Problem der Erweiterung von daten-getriebenen Modellen bezüglich neuer Wirkstoffen an, und untersuchen weiterhin die Therapiegeschichte des Patienten als Ersatz für das virale Genom bei der Vorhersage. Wir stellen das Rahmenwerk für die schnelle Simulation von Resistenzentwicklung vor, welches schließlich erlaubt, die bestmögliche Reihenfolge von Kombinationstherapien zu suchen. Schließlich analysieren wir das HIV Oberflächenprotein im Hinblick auf seine Anfälligkeit für neutralisierende Antikörper mit dem Ziel die Impfstoff Entwicklung zu unterstützen

    Bioinformatical approaches to ranking of anti-HIV combination therapies and planning of treatment schedules

    Get PDF
    The human immunodeficiency virus (HIV) pandemic is one of the most serious health challenges humanity is facing today. Combination therapy comprising multiple antiretroviral drugs resulted in a dramatic decline in HIV-related mortality in the developed countries. However, the emergence of drug resistant HIV variants during treatment remains a problem for permanent treatment success and seriously hampers the composition of new active regimens. In this thesis we use statistical learning for developing novel methods that rank combination therapies according to their chance of achieving treatment success. These depend on information regarding the treatment composition, the viral genotype, features of viral evolution, and the patient's therapy history. Moreover, we investigate different definitions of response to antiretroviral therapy and their impact on prediction performance of our method. We address the problem of extending purely data-driven approaches to support novel drugs with little available data. In addition, we explore the prospect of prediction systems that are centered on the patient's treatment history instead of the viral genotype. We present a framework for rapidly simulating resistance development during combination therapy that will eventually allow application of combination therapies in the best order. Finally, we analyze surface proteins of HIV regarding their susceptibility to neutralizing antibodies with the aim of supporting HIV vaccine development.Die Humane Immundefizienz-Virus (HIV) Pandemie ist eine der schwerwiegendsten gesundheitlichen Herausforderungen weltweit. Kombinationstherapien bestehend aus mehreren Medikamenten führten in entwickelten Ländern zu einem drastischen Rückgang der HIV-bedingten Sterblichkeit. Die Entstehung von Arzneimittel-resistenten Varianten während der Behandlung stellt allerdings ein Problem für den anhaltenden Behandlungserfolg dar und erschwert die Zusammenstellung von neuen aktiven Kombinationen. In dieser Arbeit verwenden wir statistisches Lernen zur Entwicklung neuer Methoden, welche Kombinationstherapien bezüglich ihres erwarteten Behandlungserfolgs sortieren. Dabei nutzen wir Informationen über die Medikamente, das virale Erbgut, die Virus Evolution und die Therapiegeschichte des Patienten. Außerdem untersuchen wir unterschiedliche Definitionen für Therapieerfolg und ihre Auswirkungen auf die Güte unserer Modelle. Wir gehen das Problem der Erweiterung von daten-getriebenen Modellen bezüglich neuer Wirkstoffen an, und untersuchen weiterhin die Therapiegeschichte des Patienten als Ersatz für das virale Genom bei der Vorhersage. Wir stellen das Rahmenwerk für die schnelle Simulation von Resistenzentwicklung vor, welches schließlich erlaubt, die bestmögliche Reihenfolge von Kombinationstherapien zu suchen. Schließlich analysieren wir das HIV Oberflächenprotein im Hinblick auf seine Anfälligkeit für neutralisierende Antikörper mit dem Ziel die Impfstoff Entwicklung zu unterstützen
    corecore