Search CORE

189 research outputs found

HIV Drug Resistant Prediction and Featured Mutants Selection using Machine Learning Approaches

Author: Yu Xiaxia
Publication venue: ScholarWorks @ Georgia State University
Publication date: 16/12/2014
Field of study

HIV/AIDS is widely spread and ranks as the sixth biggest killer all over the world. Moreover, due to the rapid replication rate and the lack of proofreading mechanism of HIV virus, drug resistance is commonly found and is one of the reasons causing the failure of the treatment. Even though the drug resistance tests are provided to the patients and help choose more efficient drugs, such experiments may take up to two weeks to finish and are expensive. Because of the fast development of the computer, drug resistance prediction using machine learning is feasible. In order to accurately predict the HIV drug resistance, two main tasks need to be solved: how to encode the protein structure, extracting the more useful information and feeding it into the machine learning tools; and which kinds of machine learning tools to choose. In our research, we first proposed a new protein encoding algorithm, which could convert various sizes of proteins into a fixed size vector. This algorithm enables feeding the protein structure information to most state of the art machine learning algorithms. In the next step, we also proposed a new classification algorithm based on sparse representation. Following that, mean shift and quantile regression were included to help extract the feature information from the data. Our results show that encoding protein structure using our newly proposed method is very efficient, and has consistently higher accuracy regardless of type of machine learning tools. Furthermore, our new classification algorithm based on sparse representation is the first application of sparse representation performed on biological data, and the result is comparable to other state of the art classification algorithms, for example ANN, SVM and multiple regression. Following that, the mean shift and quantile regression provided us with the potentially most important drug resistant mutants, and such results might help biologists/chemists to determine which mutants are the most representative candidates for further research

ScholarWorks @ Georgia State University

Multi-dimensional classification of GABAergic interneurons with Bayesian network-modeled label uncertainty

Author: Benavides-Piccione Ruth
Bielza Lozoya María Concepción
De Felipe Oroquieta Javier
Larrañaga Múgica Pedro María
Mihaljevic Bojan
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Abstract Interneuron classification is an important and long-debated topic in neuroscience. A recent study provided a data set of digitally reconstructed interneurons classified by 42 leading neuroscientists according to a pragmatic classification scheme composed of five categorical variables, namely, of the interneuron type and four features of axonal morphology. From this data set we now learned a model which can classify interneurons, on the basis of their axonal morphometric parameters, into these five descriptive variables simultaneously. Because of differences in opinion among the neuroscientists, especially regarding neuronal type, for many interneurons we lacked a unique, agreed-upon classification, which we could use to guide model learning. Instead, we guided model learning with a probability distribution over the neuronal type and the axonal features, obtained, for each interneuron, from the neuroscientists’ classification choices. We conveniently encoded such probability distributions with Bayesian networks, calling them label Bayesian networks (LBNs), and developed a method to predict them. This method predicts an LBN by forming a probabilistic consensus among the LBNs of the interneurons most similar to the one being classified. We used 18 axonal morphometric parameters as predictor variables, 13 of which we introduce in this paper as quantitative counterparts to the categorical axonal features. We were able to accurately predict interneuronal LBNs. Furthermore, when extracting crisp (i.e., non-probabilistic) predictions from the predicted LBNs, our method outperformed related work on interneuron classification. Our results indicate that our method is adequate for multi-dimensional classification of interneurons with probabilistic labels. Moreover, the introduced morphometric parameters are good predictors of interneuron type and the four features of axonal morphology and thus may serve as objective counterparts to the subjective, categorical axonal features

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Computational approaches to study drug resistance mechanisms

Author: Khalid Zoya
Publication venue
Publication date: 01/01/2017
Field of study

Drug resistance is a major obstacle faced by therapists in treating complex diseases like cancer, epilepsy, arthritis and HIV infected patients. The reason behind these phenomena is either protein mutation or the changes in gene expression level that induces resistance to drug treatments. These mutations affect the drug binding activity, hence resulting in failure of treatment. All this information has been stored in PubMed directories as text data. Extracting useful knowledge from an unstructured textual data is a challenging task for biologists, since biomedical literature is growing exponentially on a daily basis. Building an automated method for such tasks is gaining much attention among researchers. In this thesis we have developed a disease categorized database ZK DrugResist that automatically extracts mutations and expression changes associated with drug resistance from PubMed. This tool also includes semantic relations extracted from biomedical text covering drug resistance and established a server including both of these features. Our system was tested for three relations, Resistance (R), Intermediate (I) and Susceptible (S) by applying hybrid feature set. From the last few decades the focus has changed to hybrid approaches as it provides better results. In our case this approach combines rule-based methods with machine learning techniques. The results showed 97.7% accuracy with 96% precision, recall and F-measure. The results have outperformed the previously existing relation extraction systems thus facilitating computational analysis of drug resistance against complex diseases and further can be implemented on other areas of biomedicine. Literature is filled with HIV drug resistance providing the worth of training data as compared to other diseases, hence we developed a computational method to predict HIV resistance. For this we combined both sequence and structural features and applied SVM and Random Forests classifiers. The model was tested on the mutants of HIV-1 protease and reverse transcriptase.Taken together the features we have used in our method, total contact energies among multiple mutations have a strong impact in predicting resistance as they are crucial in understanding the interactions of HIV mutants. The combination of sequence-structure features o↵ers high accuracy with support vector machines as compared to Random Forests classifier. Both single and acquisition of multiple mutations are important in predicting HIV resistance to certain drug treatments. We have discovered the practicality of these features; hence these can be used in the future to predict resistance for other complex diseases. Another way to deal drug resistance is the application of drug repurposing. Drug often binds to more that one targets defined as polypharmacology which can be applied to drug repositioning also referred as therapeutic switching. The traditional drug discovery and development is a high-priced and tedious process, thus making drug repurposing a popular alternate strategy. We have proposed a method based on similarity scheme that predicts both approved and novel targets for drug and new disease associations. We combined PPI, biological pathways, binding site structural similarities and disease-disease similarity measures. We used sixty drugs for training the algorithm and tested it on eight separate drugs. The results showed 95% accuracy in predicting the approved and novel targets surpassing the existing methods. All these parameters help in elucidating the unknown associations between drug and diseases for finding the new uses for old drugs. Hence repurposing offers novel candidates from existing pool of drugs providing a ray of hope in combating drug resistance

Sabanci University Research Database

Drug Target Interaction Prediction Using Machine Learning Techniques – A Review

Author: Idhaya T.
Raja S. P.
Suruliandi A.
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 15/03/2023
Field of study

Drug discovery is a key process, given the rising and ubiquitous demand for medication to stay in good shape right through the course of one’s life. Drugs are small molecules that inhibit or activate the function of a protein, offering patients a host of therapeutic benefits. Drug design is the inventive process of finding new medication, based on targets or proteins. Identifying new drugs is a process that involves time and money. This is where computer-aided drug design helps cut time and costs. Drug design needs drug targets that are a protein and a drug compound, with which the interaction between a drug and a target is established. Interaction, in this context, refers to the process of discovering protein binding sites, which are protein pockets that bind with drugs. Pockets are regions on a protein macromolecule that bind to drug molecules. Researchers have been at work trying to determine new Drug Target Interactions (DTI) that predict whether or not a given drug molecule will bind to a target. Machine learning (ML) techniques help establish the interaction between drugs and their targets, using computer-aided drug design. This paper aims to explore ML techniques better for DTI prediction and boost future research. Qualitative and quantitative analyses of ML techniques show that several have been applied to predict DTIs, employing a range of classifiers. Though DTI prediction improves with negative drug target pairs (DTP), the lack of true negative DTPs has led to the use a particular dataset of drugs and targets. Using dynamic DTPs improves DTI prediction. Little attention has so far been paid to developing a new classifier for DTI classification, and there is, unquestionably, a need for better ones

Re-UNIR

Support vector machine prediction of HIV-1 drug resistance using The Viral Nucleotide patterns

Author: Araya Seare Tesfamichael
Publication venue
Publication date: 23/02/2007
Field of study

Student Number : 0213068F - MSc Dissertation - School of Computer Science - Faculty of ScienceDrug resistance of the HI virus due to its fast replication and error-prone mutation is a key factor in the failure to combat the HIV epidemic. For this reason, performing pre-therapy drug resistance testing and administering appropriate drugs or combination of drugs accordingly is very useful. There are two approaches to HIV drug resistance testing: phenotypic (clinical) and genotypic (based on the particular virus’s DNA). Genotyping tests HIV drug resistance by detecting specific mutations known to confer drug resistance. It is cheaper and can be computerised. However, it requires being able to know or learn what mutations confer drug resistance. Previous research using pattern recognition techniques has been promising, but the performance needs to be improved. It is also important for techniques that can quickly learn new rules when faced with new mutations or drugs. A relatively recent addition to these techniques is the Support Vector Machines (SVMs). SVMs have proved very successful in many benchmark applications such as face recognition, text recognition, and have also performed well in many computational biology problems where the number of features targeted is large compared to the number of available samples. This paper explores the use of SVMs in predicting the drug resistance of an HIV strain extracted from a patient based on the genetic sequence of those parts of the viral DNA encoding for the two enzymes, Reverse Transcriptase or Protease, which are critical for the replication of the HIV virus. In particular, it is the aim of this reseach to design the model without incorporating the biological knowledge at hand to enable the resulting classifier accommodate new drugs and mutations. To evaluate the performance of SVMs we used cross validation technique to measure the unbiased estimate on 2045 data points. The accuracy of classification and the area under the receiver operating characteristics curve (AUC) was used as a performance measure. Furthermore, to compare the performance of our SVMs model we also developed other prediction models based on popular classification algorithms, namely neural networks, decision trees and logistic regressions. The results show that SVMs are a highly successful classifier and out-perform other techniques with performance ranging between (94.13%–96.33%) accuracy and (81.26% - 97.49%) AUC. Decision trees were rated second and logistic regression performed the worst

Wits Institutional Repository on DSPACE

Predictive models for anti-tubercular molecules using machine learning on high-throughput biological screening datasets

Author: Abdul UC Jaleel
AC Schierz
B Waszkowycz
C Elkan
DM Iseman
IH Witten
JA Maddry
JC Platt
Jinuraj K Rajappan
JL Melville
JP Vert
JR Quinlan
K Liu
L Breiman
N Friedman
N Japkowicz
O Ivanciuc
P Domingos
P Vasanthanathan
R Lahana
R Lowe
RR Bouckaert
S Ananthan
S Ekins
S Ekins
S Ekins
S Ekins
TAACF
TM Mitchell
Vinita Periwal
Vinod Scaria
VS Sheng
Y Murakami
Y Saeys
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Tuberculosis is a contagious disease caused by <it>Mycobacterium tuberculosis </it>(Mtb), affecting more than two billion people around the globe and is one of the major causes of morbidity and mortality in the developing world. Recent reports suggest that Mtb has been developing resistance to the widely used anti-tubercular drugs resulting in the emergence and spread of multi drug-resistant (MDR) and extensively drug-resistant (XDR) strains throughout the world. In view of this global epidemic, there is an urgent need to facilitate fast and efficient lead identification methodologies. Target based screening of large compound libraries has been widely used as a fast and efficient approach for lead identification, but is restricted by the knowledge about the target structure. Whole organism screens on the other hand are target-agnostic and have been now widely employed as an alternative for lead identification but they are limited by the time and cost involved in running the screens for large compound libraries. This could be possibly be circumvented by using computational approaches to prioritize molecules for screening programmes. Results We utilized physicochemical properties of compounds to train four supervised classifiers (Naïve Bayes, Random Forest, J48 and SMO) on three publicly available bioassay screens of Mtb inhibitors and validated the robustness of the predictive models using various statistical measures. Conclusions This study is a comprehensive analysis of high-throughput bioassay data for anti-tubercular activity and the application of machine learning approaches to create target-agnostic predictive models for anti-tubercular agents.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Predicting and analyzing HIV-1 adaptation to broadly neutralizing antibodies and the host immune system using machine learning

Author: Hake Anna
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2022
Field of study

Thanks to its extraordinarily high mutation and replication rate, the human immunodeficiency virus type 1 (HIV-1) is able to rapidly adapt to the selection pressure imposed by the host immune system or antiretroviral drug exposure. With neither a cure nor a vaccine at hand, viral control is a major pillar in the combat of the HIV-1 pandemic. Without drug exposure, interindividual differences in viral control are partly influenced by host genetic factors like the human leukocyte antigen (HLA) system, and viral genetic factors like the predominant coreceptor usage of the virus. Thus, a close monitoring of the viral population within the patients and adjustments in the treatment regimens, as well as a continuous development of new drug components are indispensable measures to counteract the emergence of viral escape variants. To this end, a fast and accurate determination of the viral adaptation is essential for a successful treatment. This thesis is based upon four studies that aim to develop and apply statistical learning methods to (i) predict adaptation of the virus to broadly neutralizing antibodies (bNAbs), a promising new treatment option, (ii) advance antibody-mediated immunotherapy for clinical usage, and (iii) predict viral adaptation to the HLA system to further understand the switch in HIV-1 coreceptor usage. In total, this thesis comprises several statistical learning approaches to predict HIV-1 adaptation, thereby, enabling a better control of HIV-1 infections.Dank seiner außergewöhnlich hohen Mutations- und Replikationsrate ist das humane Immundefizienzvirus Typ 1 (HIV-1) in der Lage sich schnell an den vom Immunsystem des Wirtes oder durch die antiretrovirale Arzneimittelexposition ausgeübten Selektionsdruck anzupassen. Da weder ein Heilmittel noch ein Impfstoff verfügbar sind, ist die Viruskontrolle eine wichtige Säule im Kampf gegen die HIV-1-Pandemie. Ohne Arzneimittelexposition werden interindividuelle Unterschiede in der Viruskontrolle teilweise durch genetische Faktoren des Wirts wie das humane Leukozytenantigensystem (HLA) und virale genetische Faktoren wie die vorherrschende Korezeptornutzung des Virus beeinflusst. Eine genaue Überwachung der Viruspopulation innerhalb des Patienten, gegebenfalls Anpassungen der Behandlungsschemata sowie eine kontinuierliche Entwicklung neuer Wirkstoffkomponenten sind daher unerlässliche Maßnahmen, um dem Auftreten viraler Fluchtvarianten entgegenzuwirken. Für eine erfolgreiche Behandlung ist eine schnelle und genaue Bestimmung der Anpassung einer Variante essentiell. Die Thesis basiert auf vier Studien, deren Ziel es ist statistische Lernverfahren zu entwickeln und anzuwenden, um (1) die Anpassung von HIV-1 an breit neutralisierende Antikörper, eine neuartige vielversprechende Therapieoption, vorherzusagen, (2) den Einsatz von Antikörper-basierte Immuntherapien für den klinischen Einsatz voranzutreiben, und (3) die virale Anpassung von HIV-1 an das HLA-System vorherzusagen, um den Wechsel der HIV-1 Korezeptornutzung besser zu verstehen. Zusammenfassend umfasst diese Thesis mehrere statistische Lernverfahrenansätze, um HIV Anpassung vorherzusagen, wodurch eine bessere Kontrolle von HIV-1 Infektionen ermöglicht wird

Universaar

Acronym

7th German Conference on Chemoinformatics: 25 CIC-Workshop : Goslar, Germany, 6 - 8 November 2011 ; meeting abstracts / Edited by Frank Oellien, Uli Fechner and Thomas Engel

Author: Engel Thomas
Fechner Uli
Oellien Frank
Publication venue
Publication date: 01/05/2012
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Learning feature dependencies for noise correction in biomedical prediction

Author: PANG Hwee Hwa
TAN Ah-Hwee
YAP Ghim-Eng
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/04/2011
Field of study

Institutional Knowledge at Singapore Management University

Virtual screening of potential bioactive substances using the support vector machine approach

Author: Byvatov Evgeny
Publication venue
Publication date: 11/01/2006
Field of study

Die vorliegende Dissertation stellt eine kumulative Arbeit dar, die in insgesamt acht wissenschaftlichen Publikationen (fünf publiziert, zwei eingerichtet und eine in Vorbereitung) dargelegt ist. In diesem Forschungsprojekt wurden Anwendungen von maschinellem Lernen für das virtuelle Screening von Moleküldatenbanken durchgeführt. Das Ziel war primär die Einführung und Überprüfung des Support-Vector-Machine (SVM) Ansatzes für das virtuelle Screening nach potentiellen Wirkstoffkandidaten. In der Einleitung der Arbeit ist die Rolle des virtuellen Screenings im Wirkstoffdesign beschrieben. Methoden des virtuellen Screenings können fast in jedem Bereich der gesamten pharmazeutischen Forschung angewendet werden. Maschinelles Lernen kann einen Einsatz finden von der Auswahl der ersten Moleküle, der Optimierung der Leitstrukturen bis hin zur Vorhersage von ADMET (Absorption, Distribution, Metabolism, Toxicity) Eigenschaften. In Abschnitt 4.2 werden möglichen Verfahren dargestellt, die zur Beschreibung von chemischen Strukturen eingesetzt werden können, um diese Strukturen in ein Format zu bringen (Deskriptoren), das man als Eingabe für maschinelle Lernverfahren wie Neuronale Netze oder SVM nutzen kann. Der Fokus ist dabei auf diejenigen Verfahren gerichtet, die in der vorliegenden Arbeit verwendet wurden. Die meisten Methoden berechnen Deskriptoren, die nur auf der zweidimensionalen (2D) Struktur basieren. Standard-Beispiele hierfür sind physikochemische Eigenschaften, Atom- und Bindungsanzahl etc. (Abschnitt 4.2.1). CATS Deskriptoren, ein topologisches Pharmakophorkonzept, sind ebenfalls 2D-basiert (Abschnitt 4.2.2). Ein anderer Typ von Deskriptoren beschreibt Eigenschaften, die aus einem dreidimensionalen (3D) Molekülmodell abgeleitet werden. Der Erfolg dieser Beschreibung hangt sehr stark davon ab, wie repräsentativ die 3D-Konformation ist, die für die Berechnung des Deskriptors angewendet wurde. Eine weitere Beschreibung, die wir in unserer Arbeit eingesetzt haben, waren Fingerprints. In unserem Fall waren die verwendeten Fingerprints ungeeignet zum Trainieren von Neuronale Netzen, da der Fingerprintvektor zu viele Dimensionen (~ 10 hoch 5) hatte. Im Gegensatz dazu hat das Training von SVM mit Fingerprints funktioniert. SVM hat den Vorteil im Vergleich zu anderen Methoden, dass sie in sehr hochdimensionalen Räumen gut klassifizieren kann. Dieser Zusammenhang zwischen SVM und Fingerprints war eine Neuheit, und wurde von uns erstmalig in die Chemieinformatik eingeführt. In Abschnitt 4.3 fokussiere ich mich auf die SVM-Methode. Für fast alle Klassifikationsaufgaben in dieser Arbeit wurde der SVM-Ansatz verwendet. Ein Schwerpunkt der Dissertation lag auf der SVM-Methode. Wegen Platzbeschränkungen wurde in den beigefügten Veröffentlichungen auf eine detaillierte Beschreibung der SVM verzichtet. Aus diesem Grund wird in Abschnitt 4.3 eine vollständige Einführung in SVM gegeben. Darin enthalten ist eine vollständige Diskussion der SVM Theorie: optimale Hyperfläche, Soft-Margin-Hyperfläche, quadratische Programmierung als Technik, um diese optimale Hyperfläche zu finden. Abschnitt 4.3 enthält auch eine Diskussion von Kernel-Funktionen, welche die genaue Form der optimalen Hyperfläche bestimmen. In Abschnitt 4.4 ist eine Einleitung in verschiede Methoden gegeben, die wir für die Auswahl von Deskriptoren genutzt haben. In diesem Abschnitt wird der Unterschied zwischen einer „Filter“- und der „Wrapper“-basierten Auswahl von Deskriptoren herausgearbeitet. In Veröffentlichung 3 (Abschnitt 7.3) haben wir die Vorteile und Nachteile von Filter- und Wrapper-basierten Methoden im virtuellen Screening vergleichend dargestellt. Abschnitt 7 besteht aus den Publikationen, die unsere Forschungsergebnisse enthalten. Unsere erste Publikation (Veröffentlichung 1) war ein Übersichtsartikel (Abschnitt 7.1). In diesem Artikel haben wir einen Gesamtüberblick der Anwendungen von SVM in der Bio- und Chemieinformatik gegeben. Wir diskutieren Anwendungen von SVM für die Gen-Chip-Analyse, die DNASequenzanalyse und die Vorhersage von Proteinstrukturen und Proteininteraktionen. Wir haben auch Beispiele beschrieben, wo SVM für die Vorhersage der Lokalisation von Proteinen in der Zelle genutzt wurden. Es wird dabei deutlich, dass SVM im Bereich des virtuellen Screenings noch nicht verbreitet war. Um den Einsatz von SVM als Hauptmethode unserer Forschung zu begründen, haben wir in unserer nächsten Publikation (Veröffentlichung 2) (Abschnitt 7.2) einen detaillierten Vergleich zwischen SVM und verschiedenen neuronalen Netzen, die sich als eine Standardmethode im virtuellen Screening etabliert haben, durchgeführt. Verglichen wurde die Trennung von wirstoffartigen und nicht-wirkstoffartigen Molekülen („Druglikeness“-Vorhersage). Die SVM konnte 82% aller Moleküle richtig klassifizieren. Die Klassifizierung war zudem robuster als mit dreilagigen feedforward-ANN bei der Verwendung verschiedener Anzahlen an Hidden-Neuronen. In diesem Projekt haben wir verschiedene Deskriptoren zur Beschreibung der Moleküle berechnet: Ghose-Crippen Fragmentdeskriptoren [86], physikochemische Eigenschaften [9] und topologische Pharmacophore (CATS) [10]. Die Entwicklung von weiteren Verfahren, die auf dem SVM-Konzept aufbauen, haben wir in den Publikationen in den Abschnitten 7.3 und 7.8 beschrieben. Veröffentlichung 3 stellt die Entwicklung einer neuen SVM-basierten Methode zur Auswahl von relevanten Deskriptoren für eine bestimmte Aktivität dar. Eingesetzt wurden die gleichen Deskriptoren wie in dem oben beschriebenen Projekt. Als charakteristische Molekülgruppen haben wir verschiedene Untermengen der COBRA Datenbank ausgewählt: 195 Thrombin Inhibitoren, 226 Kinase Inhibitoren und 227 Faktor Xa Inhibitoren. Es ist uns gelungen, die Anzahl der Deskriptoren von ursprünglich 407 auf ungefähr 50 zu verringern ohne signifikant an Klassifizierungsgenauigkeit zu verlieren. Unsere Methode haben wir mit einer Standardmethode für diese Anwendung verglichen, der Kolmogorov-Smirnov Statistik. Die SVM-basierte Methode erwies sich hierbei in jedem betrachteten Fall als besser als die Vergleichsmethoden hinsichtlich der Vorhersagegenauigkeit bei der gleichen Anzahl an Deskriptoren. Eine ausführliche Beschreibung ist in Abschnitt 4.4 gegeben. Dort sind auch verschiedene „Wrapper“ für die Deskriptoren-Auswahl beschrieben. Veröffentlichung 8 beschreibt die Anwendung von aktivem Lernen mit SVM. Die Idee des aktiven Lernens liegt in der Auswahl von Molekülen für das Lernverfahren aus dem Bereich an der Grenze der verschiedenen zu unterscheidenden Molekülklassen. Auf diese Weise kann die lokale Klassifikation verbessert werden. Die folgenden Gruppen von Moleküle wurden genutzt: ACE (Angiotensin converting enzyme), COX2 (Cyclooxygenase 2), CRF (Corticotropin releasing factor) Antagonisten, DPP (Dipeptidylpeptidase) IV, HIV (Human immunodeficiency virus) protease, Nuclear Receptors, NK (Neurokinin receptors), PPAR (peroxisome proliferator-activated receptor), Thrombin, GPCR und Matrix Metalloproteinasen. Aktives Lernen konnte die Leistungsfähigkeit des virtuellen Screenings verbessern, wie sich in dieser retrospektiven Studie zeigte. Es bleibt abzuwarten, ob sich das Verfahren durchsetzen wird, denn trotzt des Gewinns an Vorhersagegenauigkeit ist es aufgrund des mehrfachen SVMTrainings aufwändig. Die Publikationen aus den Abschnitten 7.5, 7.6 und 7.7 (Veröffentlichungen 5-7) zeigen praktische Anwendungen unserer SVM-Methoden im Wirkstoffdesign in Kombination mit anderen Verfahren, wie der Ähnlichkeitssuche und neuronalen Netzen zur Eigenschaftsvorhersage. In zwei Fällen haben wir mit dem Verfahren neuartige Liganden für COX-2 (cyclooxygenase 2) und dopamine D3/D2 Rezeptoren gefunden. Wir konnten somit klar zeigen, dass SVM-Methoden für das virtuelle Screening von Substanzdatensammlungen sinnvoll eingesetzt werden können. Es wurde im Rahmen der Arbeit auch ein schnelles Verfahren zur Erzeugung großer kombinatorischer Molekülbibliotheken entwickelt, welches auf der SMILES Notation aufbaut. Im frühen Stadium des Wirstoffdesigns ist es wichtig, eine möglichst „diverse“ Gruppe von Molekülen zu testen. Es gibt verschiedene etablierte Methoden, die eine solche Untermenge auswählen können. Wir haben eine neue Methode entwickelt, die genauer als die bekannte MaxMin-Methode sein sollte. Als erster Schritt wurde die „Probability Density Estimation“ (PDE) für die verfügbaren Moleküle berechnet. [78] Dafür haben wir jedes Molekül mit Deskriptoren beschrieben und die PDE im N-dimensionalen Deskriptorraum berechnet. Die Moleküle wurde mit dem Metropolis Algorithmus ausgewählt. [87] Die Idee liegt darin, wenige Moleküle aus den Bereichen mit hoher Dichte auszuwählen und mehr Moleküle aus den Bereichen mit niedriger Dichte. Die erhaltenen Ergebnisse wiesen jedoch auf zwei Nachteile hin. Erstens wurden Moleküle mit unrealistischen Deskriptorwerten ausgewählt und zweitens war unser Algorithmus zu langsam. Dieser Aspekt der Arbeit wurde daher nicht weiter verfolgt. In Veröffentlichung 6 (Abschnitt 7.6) haben wir in Zusammenarbeit mit der Molecular-Modeling Gruppe von Aventis-Pharma Deutschland (Frankfurt) einen SVM-basierten ADME Filter zur Früherkennung von CYP 2C9 Liganden entwickelt. Dieser nichtlineare SVM-Filter erreichte eine signifikant höhere Vorhersagegenauigkeit (q2 = 0.48) als ein auf den gleichen Daten entwickelten PLS-Modell (q2 = 0.34). Es wurden hierbei Dreipunkt-Pharmakophordeskriptoren eingesetzt, die auf einem dreidimensionalen Molekülmodell aufbauen. Eines der wichtigen Probleme im computerbasierten Wirkstoffdesign ist die Auswahl einer geeigneten Konformation für ein Molekül. Wir haben versucht, SVM auf dieses Problem anzuwenden. Der Trainingdatensatz wurde dazu mit jeweils mehreren Konformationen pro Molekül angereichert und ein SVM Modell gerechnet. Es wurden anschließend die Konformationen mit den am schlechtesten vorhergesagten IC50 Wert aussortiert. Die verbliebenen gemäß dem SVM-Modell bevorzugten Konformationen waren jedoch unrealistisch. Dieses Ergebnis zeigt Grenzen des SVM-Ansatzes auf. Wir glauben jedoch, dass weitere Forschung auf diesem Gebiet zu besseren Ergebnissen führen kann

Hochschulschriftenserver - Universität Frankfurt am Main