2,603 research outputs found

    Development and application of fast fuzzy pharmacophore-based virtual screening methods for scaffold hopping

    Get PDF
    The goal of this thesis was the development, evaluation and application of novel virtual screening approaches for the rational compilation of high quality pharmacological screening libraries. The criteria for a high quality were a high probability of the selected molecules to be active compared to randomly selected molecules and diversity in the retrieved chemotypes of the selected molecules to be prepared for the attrition of single lead structures. For the latter criterion the virtual screening approach had to perform “scaffold hopping”. The first molecular descriptor that was explicitly reported for that purpose was the topological pharmacophore CATS descriptor, representing a correlation vector (CV) of all pharmacophore points in a molecule. The representation is alignment-free and thus renders fast screening of large databases feasible. In a first series of experiments the CATS descriptor was conceptually extended to the three-dimensional pharmacophore-pair CATS3D descriptor and the molecular surface based SURFCATS descriptor. The scaling of the CATS3D descriptor, the combination of CATS3D with different similarity metrics and the dependence of the CATS3D descriptor on the threedimensional conformations of the molecules in the virtual screening database were evaluated in retrospective screening experiments. The “scaffold hopping” capabilities of CATS3D and SURFCATS were compared to CATS and the substructure fingerprint MACCS keys. Prospective virtual screening with CATS3D similarity searching was applied for the TAR RNA and the metabotropic glutamate receptor 5 (mGlur5). A combination of supervised and unsupervised neural networks trained on CATS3D descriptors was applied prospectively to compile a focused but still diverse library of mGluR5 modulators. In a second series of experiments the SQUID fuzzy pharmacophore model method was developed, that was aimed to provide a more general query for virtual screening than the CATS family descriptors. A prospective application of the fuzzy pharmacophore models was performed for TAR RNA ligands. In a last experiment a structure-/ligand-based pharmacophore model was developed for taspase1 based on a homology model of the enzyme. This model was applied prospectively for the screening for the first inhibitors of taspase1. The effect of different similarity metrics (Euc: Euclidean distance, Manh: Manhattan distance and Tani: Tanimoto similarity) and different scaling methods (unscaled, scaling1: scaling by the number of atoms, and scaling2: scaling by the added incidences of potential pharmacophore points of atom pairs) on CATS3D similarity searching was evaluated in retrospective virtual screening experiments. 12 target classes of the COBRA database of annotated ligands from recent scientific literature were used for that purpose. Scaling2, a new development for the CATS3D descriptor, was shown to perform best on average in combination with all three similarity metrics (enrichment factor ef (1%): Manh = 11.8 ± 4.3, Euc = 11.9 ± 4.6, Tani = 12.8 ± 5.1). The Tanimoto coefficient was found to perform best with the new scaling method. Using the other scaling methods the Manhattan distance performed best (ef (1%): unscaled: Manh = 9.6 ± 4.0, Euc = 8.1 ± 3.5, Tani = 8.3 ± 3.8; scaling1: Manh = 10.3 ± 4.1, Euc = 8.8 ± 3.6, Tani = 9.1 ± 3.8). Since CATS3D is independent of an alignment, the dependence of a “receptor relevant” conformation might also be weaker compared to other methods like docking. Using such methods might be a possibility to overcome problems like protein flexibility or the computational expensive calculation of many conformers. To test this hypothesis, co-crystal structures of 11 target classes served as queries for virtual screening of the COBRA database. Different numbers of conformations were calculated for the COBRA database. Using only a single conformation already resulted in a significant enrichment of isofunctional molecules on average (ef (1%) = 6.0 ± 6.5). This observation was also made for ligand classes with many rotatable bonds (e.g. HIV-protease: 19.3 ± 6.2 rotatable bonds in COBRA, ef (1%) = 12.2 ± 11.8). On average only an improvement from using the maximum number of conformations (on average 37 conformations / molecule) to using single conformations of 1.1 fold was found. It was found that using more conformations actives and inactives equally became more similar to the reference compounds according to the CATS3D representations. Applying the same parameters as before to calculate conformations for the crystal structure ligands resulted in an average Cartesian RMSD of the single conformations to the crystal structure conformations of 1.7 ± 0.7 Å. For the maximum number of conformations, the RMSD decreased to 1.0 ± 0.5 Å (1.8 fold improvement on average). To assess the virtual screening performance and the scaffold hopping potential of CATS3D and SURFACATS, these descriptors were compared to CATS and the MACCS keys, a fingerprint based on exact chemical substructures. Retrospective screening of ten classes of the COBRA database was performed. According to the average enrichment factors the MACCS keys performed best (ef (1%): MACCS = 17.4 ± 6.4, CATS = 14.6 ± 5.4, CATS3D = 13.9 ± 4.9, SURFCATS = 12.2 ± 5.5). The classes, where MACCS performed best, consisted of a lower average fraction of different scaffolds relative to the number of molecules (0.44 ± 0.13), than the classes, where CATS performed best (0.65 ± 0.13). CATS3D was the best performing method for only a single target class with an intermediate fraction of scaffolds (0.55). SURFCATS was not found to perform best for a single class. These results indicate that CATS and the CATS3D descriptors might be better suited to find novel scaffolds than the MACCS keys. All methods were also shown to complement each other by retrieving scaffolds that were not found by the other methods. A prospective evaluation of CATS3D similarity searching was done for metabotropic glutamate receptor 5 (mGluR5) allosteric modulators. Seven known antagonists of mGluR5 with sub-micromolar IC50 were used as reference ligands for virtual screening of the 20,000 most drug-like compounds – as predicted by an artificial neural network approach – of the Asinex vendor database (194,563 compounds). Eight of 29 virtual screening hits were found with a Ki below 50 µM in a binding assay. Most of the ligands were only moderately specific for mGluR5 (maximum of > 4.2 fold selectivity) relative to mGluR1, the most similar receptor to mGluR5. One ligand exhibited even a better Ki for mGluR1 than for mGluR5 (mGluR5: Ki > 100 µM, mGluR1: Ki = 14 µM). All hits had different scaffolds than the reference molecules. It was demonstrated that the compiled library contained molecules that were different from the reference structures – as estimated by MACCS substructure fingerprints – but were still considered isofunctional by both CATS and CATS3D pharmacophore approaches. Artificial neural networks (ANN) provide an alternative to similarity searching in virtual screening, with the advantage that they incorporate knowledge from a learning procedure. A combination of artificial neural networks for the compilation of a focused but still structurally diverse screening library was employed prospectively for mGluR5. Ensembles of neural networks were trained on CATS3D representations of the training data for the prediction of “mGluR5-likeness” and for “mGluR5/mGluR1 selectivity”, the most similar receptor to mGluR5, yielding Matthews cc between 0.88 and 0.92 as well as 0.88 and 0.91 respectively. The best 8,403 hits (the focused library: the intersection of the best hits from both prediction tasks) from virtually ranking the Enamine vendor database (ca. 1,000,000 molecules), were further analyzed by two self-organizing maps (SOMs), trained on CATS3D descriptors and on MACCS substructure fingerprints. A diverse and representative subset of the hits was obtained by selecting the most similar molecules to each SOM neuron. Binding studies of the selected compounds (16 molecules from each map) gave that three of the molecules from the CATS3D SOM and two of the molecules from the MACCS SOM showed mGluR5 binding. The best hit with a Ki of 21 µM was found in the CATS3D SOM. The selectivity of the compounds for mGluR5 over mGluR1 was low. Since the binding pockets in the two receptors are similar the general CATS3D representation might not have been appropriate for the prediction of selectivity. In both SOMs new active molecules were found in neurons that did not contain molecules from the training set, i. e. the approach was able to enter new areas of chemical space with respect to mGluR5. The combination of supervised and unsupervised neural networks and CATS3D seemed to be suited for the retrieval of dissimilar molecules with the same class of biological activity, rather than for the optimization of molecules with respect to activity or selectivity. A new virtual screening approach was developed with the SQUID (Sophisticated Quantification of Interaction Distributions) fuzzy pharmacophore method. In SQUID pairs of Gaussian probability densities are used for the construction of a CV descriptor. The Gaussians represent clusters of atoms comprising the same pharmacophoric feature within an alignment of several active reference molecules. The fuzzy representation of the molecules should enhance the performance in scaffold hopping. Pharmacophore models with different degrees of fuzziness (resolution) can be defined which might be an appropriate means to compensate for ligand and receptor flexibility. For virtual screening the 3D distribution of Gaussian densities is transformed into a two-point correlation vector representation which describes the probability density for the presence of atom-pairs, comprising defined pharmacophoric features. The fuzzy pharmacophore CV was used to rank CATS3D representations of molecules. The approach was validated by retrospective screening for cyclooxygenase 2 (COX-2) and thrombin ligands. A variety of models with different degrees of fuzziness were calculated and tested for both classes of molecules. Best performance was obtained with pharmacophore models reflecting an intermediate degree of fuzziness. Appropriately weighted fuzzy pharmacophore models performed better in retrospective screening than CATS3D similarity searching using single query molecules, for both COX-2 and thrombin (ef (1%): COX-2: SQUID = 39.2., best CATS3D result = 26.6; Thrombin: SQUID = 18.0, best CATS3D result = 16.7). The new pharmacophore method was shown to complement MOE pharmacophore models. SQUID fuzzy pharmacophore and CATS3D virtual screening were applied prospectively to retrieve novel scaffolds of RNA binding molecules, inhibiting the Tat-TAR interaction. A pharmacophore model was built up from one ligand (acetylpromazine, IC50 = 500 µM) and a fragment of another known ligand (CGP40336A), which was assumed to bind with a comparable binding mode as acetylpromazine. The fragment was flexible aligned to the TAR bound NMR conformation of acetylpromazine. Using an optimized SQUID pharmacophore model the 20,000 most druglike molecules from the SPECS database (229,658 compounds) were screened for Tat-TAR ligands. Both reference inhibitors were also applied for CATS3D similarity searching. A set of 19 molecules from the SQUID and CATS3D results was selected for experimental testing. In a fluorescence resonance energy transfer (FRET) assay the best SQUID hit showed an IC50 value of 46 µM, which represents an approximately tenfold improvement over the reference acetylpromazine. The best hit from CATS3D similarity searching showed an IC50 comparable to acetylpromazine (IC50 = 500 µM). Both hits contained different molecular scaffolds than the reference molecules. Structure-based pharmacophores provide an alternative to ligand-based approaches, with the advantage that no ligands have to be known in advance and no topological bias is introduced. The latter is e.g. favorable for hopping from peptide-like substrates to drug-like molecules. A homology model of the threonine aspartase taspase1 was calculated based on the crystal structures of a homologous isoaspartyl peptidase. Docking studies of the substrate with GOLD identified a binding mode where the cleaved bond was situated directly above the reactive N-terminal threonine. The predicted enzyme-substrate complex was used to derive a pharmacophore model for virtual screening for novel taspase1 inhibitors. 85 molecules were identified from virtual screening with the pharmacophore model as potential taspase1- inhibitors, however biochemical data was not available before the end of this thesis. In summary this thesis demonstrated the successful development, improvement and application of pharmacophore-based virtual screening methods for the compilation of molecule-libraries for early phase drug development. The highest potential of such methods seemed to be in scaffold hopping, the non-trivial task of finding different molecules with the same biological activity.Ziel dieser Arbeit war die Entwicklung, Untersuchung und Anwendung von neuen virtuellen Screening-Verfahren für den rationalen Entwurf hoch-qualitativer Molekül-Datenbanken für das pharmakologische Screening. Anforderung für eine hohe Qualität waren eine hohe a priori Wahrscheinlichkeit für das Vorhandensein aktiver Moleküle im Vergleich zu zufällig zusammengestellten Bibliotheken, sowie das Vorhandensein einer Vielfalt unterschiedlicher Grundstrukturen unter den selektierten Molekülen, um gegen den Ausfall einzelner Leitstrukturen in der weiteren Entwicklung abgesichert zu sein. Notwendig für die letztere Eigenschaft ist die Fähigkeit eines Verfahrens zum „Grundgerüst-Springen“. Der erste Molekül-Deskriptor, der explizit für das „Grundgerüst-Springen“ eingesetzt wurde war der CATS Deskriptor – ein topologischer Korrelations-Vektor („correlation vector“, CV) über alle Pharmakophor-Punkte eines Moleküls. Der Vergleich von Molekülen über den CATS Deskriptor geschieht ohne eine Überlagerung der Moleküle, was den effizienten Einsatz solcher Verfahren für sehr große Molekül-Datenbanken ermöglicht. In einer ersten Serie von Versuchen wurde der CATS Deskriptor erweitert zu dem dreidimensionalen CATS3D Deskriptor und dem auf der Molekül-Oberfläche basierten SURFCATS Deskriptor. In retrospektiven Studien wurde für diese Deskriptoren der Einfluss verschiedener Skalierungs-Methoden, die Kombination mit unterschiedlichen Ähnlichkeits- Metriken und die Auswirkung verschiedener dreidimensionaler Konformationen untersucht. Weiter wurden das Potential der entwickelten Deskriptoren CATS3D und SURFCATS im „Grundgerüst-Springen“ mit CATS und dem Substruktur-Fingerprint MACCS keys verglichen. Prospektive Anwendungen der CATS3D Ähnlichkeitssuche wurden für die TARRNA und den metabotropen Glutamat Rezeptor 5 (mGluR5) durchgeführt. Eine Kombination von überwachten und unüberwachten neuronalen Netzen wurde prospektiv für die Zusammenstellung einer fokussierten aber dennoch diversen Bibliothek von mGluR5 Modulatoren eingesetzt. In einer zweiten Reihe von Versuchen wurde der SQUID Fuzzy Pharmakophor Ansatz entwickelt, mit dem Ziel zu einer noch generelleren Molekül- Beschreibung als mit den Deskriptoren aus der CATS Familie zu gelangen. Eine prospektive Anwendung der „Fuzzy Pharmakophor“ Methode wurde für die TAR-RNA durchgeführt. In einem letzten Versuch wurde für Taspase1 ein Struktur-/Liganden-basiertes Pharmakophor- Modell auf der Grundlage eines Homologie-Modells des Enzyms entwickelt. Dieses wurde für das prospektive Screening nach Taspase1-Inhibitoren eingesetzt. Der Einfluss verschiedener Ähnlichkeits-Metriken (Euk: Euklidische Distanz, Manh: Manhattan Distanz, Tani: Tanimoto Ähnlichkeit) und verschiedener Skalierungs-Methoden (Ohne-Skalierung, Skalierung1: Skalierung aller Werte nach der Anzahl Atome, Skalierung2: Skalierung der Werte eines Paares von Pharmakophor-Punkten entsprechend der Summe aller Pharmakophor-Punkte mit denselben Pharmakophor-Typen) auf die Ähnlichkeits-Suche mit CATS3D wurde in retrospektiven virtuellen Screening Experimenten untersucht. Für diesen Zweck wurden 12 verschiedene Klassen von Rezeptoren und Enzymen aus der COBRA Datenbank von annotierten Liganden aus der jüngeren wissenschaftlichen Literatur eingesetzt. Skalierung2, eine neue Entwicklung für CATS3D, zeigte im Durchschnitt die beste Performanz in Kombination mit allen drei Ähnlichkeits-Metriken (Anreicherungs-Faktor ef (1%): Manh = 11,8 ± 4,3; Euk = 11,9 ± 4,6; Tani = 12,8 ± 5,1). Die Kombination von Skalierung2 mit dem Tanimoto Ähnlichkeits-Koeffizienten lieferte die besten Ergebnisse. In Kombination mit den anderen Skalierungen brachte die Manhattan Distanz die besten Ergebnisse (ef (1%): Ohne-Skalierung: Manh = 9,6 ± 4,0; Euk = 8,1 ± 3,5; Tani = 8,3 ± 3,8; Skalierung1: Manh = 10,3 ± 4,1; Euk = 8,8 ± 3,6; Tani = 9,1 ± 3,8). Da die CATS3D Ähnlichkeits-Suche unabhängig von der Überlagerung einzelner Moleküle ist, könnte ebenfalls eine gewisse Unabhängigkeit von der vorhandenen 3D Konformation bestehen. Eine solche Unabhängigkeit wäre interessant um die zeitaufwendige Berechnung multipler Konformationen zu umgehen. Um diese Hypothese zu untersuchen wurden Co-Kristalle von Liganden aus 11 Klassen von Rezeptoren und Enzymen ausgewählt, um als Anfrage-Strukturen im virtuellen Screening in der COBRA Datenbank zu dienen. Verschiedene Versionen der COBRA Datenbank mit unterschiedlicher Anzahl Konformationen wurden berechnet. Bereits mit einer einzigen Konformation pro Molekül konnte im Mittel eine deutliche Anreicherung an aktiven Molekülen beobachte werden (ef (1%) = 6,0 ± 6,5). Diese Beobachtung beinhaltete auch Klassen von Molekülen mit vielen rotierbaren Bindungen. (z.B. HIV-Protease: 19,3 ± 6,2 rotierbare Bindungen in COBRA, ef (1%) = 12,2 ± 11,8). Im Mittel konnten dazu bei Verwendung der maximalen Anzahl Konformationen (durchschnittlich 37 Konformationen / Molekül) nur eine Verbesserung von 1.1 festgestellt werden. Nach der CATS3D Ähnlichkeit wurden die inaktiven Moleküle im gleichen Maß ähnlicher zu den Referenzen als die aktiven Moleküle. Zum Vergleich konnte durch Verwendung multipler statt einzelner Konformationen eine 1,8-fache Verbesserung des RMSD zu den Konformationen aus den Kristall-Struktur Konformationen erreicht werden (einzelne Konformationen: 1,7 ± 0,7 Å; max. Konformationen: 1,0 ± 0,5 Å). Um die Leistungsfähigkeit von CATS3D und SURFCATS im virtuellen Screening und im Grundgerüst-Springen zu beurteilen, wurden diese Deskriptoren mit CATS und den MACCS keys, einem Fingerprint basierend auf exakten chemischen Substrukturen, verglichen. Für die retrospektive Analyse wurden 10 Klassen von Rezeptoren und Enzymen aus der COBRA Datenbank ausgewählt. Nach den mittleren Anreicherungs-Faktoren ergaben sich für MACCS die besten Resultate (ef (1%): MACCS = 17,4 ± 6,4; CATS = 14,6 ± 5,4; CATS3D = 13,9 ± 4,9; SURFCATS = 12,2 ± 5,5). Es zeigte sich, dass die Klassen, in denen MACCS die besten Ergebnisse erzielen konnte, einen geringen gemittelten Anteil von verschiedenen Grundgerüsten aufwiesen im Verhältnis zu der Anzahl an Molekülen (0,44 ± 0,13) als die Klassen, in denen CATS am besten war (0,65 ± 0,13). CATS3D war nur in einer Klasse mit einem mittleren Anteil von Grundgerüsten (0,55) die beste Methode. SURFCATS war für keine Klasse besser als alle anderen Methoden. Diese Ergebnisse deuten darauf hin, dass Methoden wie CATS und CATS3D besser geeignet sind, um neue Grundgerüste zu finden. Es konnte weiter gezeigt werden, dass sich die Methoden einander ergänzen, dass also mit jeder Methode Grundgerüste gefunden werden konnten, die mit keiner der anderen Methoden gefunden werden konnten. Eine prospektive Anwendung wurde für CATS3D in der Suche nach neuen allosterischen Modulatoren des metabotropen Glutamat Rezeptors 5 (mGluR5) durchgeführt. Sieben bekannte allosterische mGluR5 Antagonisten mit sub-mikromolaren IC50 Werten wurde als Referenzen eingesetzt. Das virtuelle Screening wurde auf den 20.000 von einem künstlichen neuronalen Netz als am wirkstoff-artigsten vorhergesagten Molekülen der Asinex Datenbank (194.563 Moleküle) durchgeführt. Acht der 29 gefundenen Hits aus dem virtuellen Screening zeigten Ki Werte unter 50 µM in einem Bindungs-Assay. Die Mehrheit der Liganden zeigte nur eine geringe Selektivität (Maximum > 4,2-fach) gegenüber mGluR1, dem ähnlichsten Rezeptor zu mGluR5. Einer der Liganden zeigte einen besseren Ki für mGluR1 als für mGluR5 (mGluR5: Ki > 100 µM, mGluR1: Ki = 14 µM). Alle gefundenen Moleküle zeigten verschiedene Grundgerüste als die Referenz Moleküle. Es konnte gezeigt werden, dass die zusammengestellte Bibliothek von den MACCS keys als unterschiedlich zu den Referenz Strukturen betrachtet wurden, von CATS und CATS3D aber noch als isofunktional betracht wurden. Künstliche neuronal Netze („artificial neural net“, ANN) bieten eine Alternative zur Ähnlichkeits-Suche im virtuellen Screening mit dem Vorteil, dass in einer Serie von Liganden enthaltenes implizites Wissen über eine Lernprozedur in ein Modell integrierte werden kann. Eine Kombination von ANNs für die Zusammenstellung einer fokussierten aber dennoch diversen Molekül-Bibliothek wurde prospektiv für die Suche nach mGluR5 Antagonisten eingesetzt. Gruppen von ANNs wurden auf den Basis von CATS3D Repräsentationen für die Vorhersage von „mGluR5-artigkeit“ und „mGluR5/mGluR1 Selektivität“ trainiert. Dabei ergaben sich Matthews cc zwischen 0,88 und 0,92 sowie zwischen 0,88 und 0,91. Die besten 8.403 Hits (die Schnittmenge der besten Hits aus beiden Vorhersagen) aus einem virtuellen Screening der Enamine Datenbank (ca. 1.000.000 Moleküle) ergab die fokussierte Bibliothek. Diese wurde weiter mit Selbstor

    First-principles molecular structure search with a genetic algorithm

    Full text link
    The identification of low-energy conformers for a given molecule is a fundamental problem in computational chemistry and cheminformatics. We assess here a conformer search that employs a genetic algorithm for sampling the low-energy segment of the conformation space of molecules. The algorithm is designed to work with first-principles methods, facilitated by the incorporation of local optimization and blacklisting conformers to prevent repeated evaluations of very similar solutions. The aim of the search is not only to find the global minimum, but to predict all conformers within an energy window above the global minimum. The performance of the search strategy is: (i) evaluated for a reference data set extracted from a database with amino acid dipeptide conformers obtained by an extensive combined force field and first-principles search and (ii) compared to the performance of a systematic search and a random conformer generator for the example of a drug-like ligand with 43 atoms, 8 rotatable bonds and 1 cis/trans bond

    Optimal assignment methods for ligand-based virtual screening

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ligand-based virtual screening experiments are an important task in the early drug discovery stage. An ambitious aim in each experiment is to disclose active structures based on new scaffolds. To perform these "scaffold-hoppings" for individual problems and targets, a plethora of different similarity methods based on diverse techniques were published in the last years. The optimal assignment approach on molecular graphs, a successful method in the field of quantitative structure-activity relationships, has not been tested as a ligand-based virtual screening method so far.</p> <p>Results</p> <p>We evaluated two already published and two new optimal assignment methods on various data sets. To emphasize the "scaffold-hopping" ability, we used the information of chemotype clustering analyses in our evaluation metrics. Comparisons with literature results show an improved early recognition performance and comparable results over the complete data set. A new method based on two different assignment steps shows an increased "scaffold-hopping" behavior together with a good early recognition performance.</p> <p>Conclusion</p> <p>The presented methods show a good combination of chemotype discovery and enrichment of active structures. Additionally, the optimal assignment on molecular graphs has the advantage to investigate and interpret the mappings, allowing precise modifications of internal parameters of the similarity measure for specific targets. All methods have low computation times which make them applicable to screen large data sets.</p

    Sobiva omaduste profiiliga ühendite tuvastamine keemiliste struktuuride andmekogudest

    Get PDF
    Keemiliste ühendite digitaalsete andmebaaside kasutuselevõtuga kaasneb vajadus leida neist arvutuslikke vahendeid kasutades sobivate omadustega molekule. Probleem on eriti huvipakkuv ravimitööstuses, kus aja- ja ressursimahukate katsete asendamine arvutustega, võimaldab märkimisväärset säästu. Kuigi tänapäevaste arvutusmeetodite piiratud võimsuse tõttu ei ole lähemas tulevikus võimalik kogu ravimidisaini protsessi algusest lõpuni arvutitesse ümber kolida, on lugu teine, kui vaadelda suuri andmekogusid. Arvutusmeetod, mis töötab teadaoleva statistilise vea piires, visates välja mõne sobiva ühendi ja lugedes mõni ekslikult aktiivseks, tihendab lõppkokkuvõttes andmekomplekti tuntaval määral huvitavate ühendite suhtes. Seetõttu on ravimiarenduse lihtsamate ja vähenõudlikkumade etappide puhul, nagu juhtühendite või ravimikandidaatide leidmine, edukalt võimalik rakendada arvutuslikke vahendeid. Selline tegevus on tuntud virtuaalsõelumisena ning käesolevasse töösse on sellest avarast ja kiiresti arenevast valdkonnast valitud mõningad suunad, ning uuritud nende võimekust ja tulemuslikkust erinevate projektide raames. Töö tulemusena on valminud arvutusmudelid teatud tüüpi ühendite HIV proteaasi vastase aktiivsuse ja tsütotoksilisuse hindamiseks; koostatud uus sõelumismeetod; leitud potentsiaalsed ligandid HIV proteaasile ja pöördtranskriptaasile; ning kokku pandud farmakokineetiliste filtritega eeltöödeldud andmekomplekt – mugav lähtepositsioon edasisteks töödeks.With the implementation of digital chemical compound libraries, creates the need for finding compounds from them that fit the desired profile. The problem is of particular interest in drug design, where replacing the resource-intensive experiments with computational methods, would result in significant savings in time and cost. Although due to the limitations of current computational methods, it is not possible in foreseeable future to transfer all of the drug development process into computers, it is a different story with large molecular databases. An in silico method, working within a known error margin, is still capable of significantly concentrating the data set in terms of attractive compounds. That allows the use of computational methods in less stringent steps of drug development, such as finding lead compounds or drug candidates. This approach is known as virtual screening, and today it is a vast and prospective research area comprising of several paradigms and numerous individual methods. The present thesis takes a closer look on some of them, and evaluates their performance in the course of several projects. The results of the thesis include computational models to estimate the HIV protease inhibition activity and cytotoxicity of certain type of compounds; a few prospective ligands for HIV protease and reverse transcriptase; pre-filtered dataset of compounds – convenient starting point for subsequent projects; and finally a new virtual screening method was developed

    Inferring Protein-Protein Interactions (PPIs) Based on Computational Methods

    Get PDF

    Predicting sumoylation sites using support vector machines based on various sequence features, conformational flexibility and disorder

    Get PDF
    Background Sumoylation, which is a reversible and dynamic post-translational modification, is one of the vital processes in a cell. Before a protein matures to perform its function, sumoylation may alter its localization, interactions, and possibly structural conformation. Abberations in protein sumoylation has been linked with a variety of disorders and developmental anomalies. Experimental approaches to identification of sumoylation sites may not be effective due to the dynamic nature of sumoylation, laborsome experiments and their cost. Therefore, computational approaches may guide experimental identification of sumoylation sites and provide insights for further understanding sumoylation mechanism. Results In this paper, the effectiveness of using various sequence properties in predicting sumoylation sites was investigated with statistical analyses and machine learning approach employing support vector machines. These sequence properties were derived from windows of size 7 including position-specific amino acid composition, hydrophobicity, estimated sub-window volumes, predicted disorder, and conformational flexibility. 5-fold cross-validation results on experimentally identified sumoylation sites revealed that our method successfully predicts sumoylation sites with a Matthew's correlation coefficient, sensitivity, specificity, and accuracy equal to 0.66, 73%, 98%, and 97%, respectively. Additionally, we have showed that our method compares favorably to the existing prediction methods and basic regular expressions scanner. Conclusions By using support vector machines, a new, robust method for sumoylation site prediction was introduced. Besides, the possible effects of predicted conformational flexibility and disorder on sumoylation site recognition were explored computationally for the first time to our knowledge as an additional parameter that could aid in sumoylation site prediction

    Enumeration, conformation sampling and population of libraries of peptide macrocycles for the search of chemotherapeutic cardioprotection agents

    Get PDF
    Peptides are uniquely endowed with features that allow them to perturb previously difficult to drug biomolecular targets. Peptide macrocycles in particular have seen a flurry of recent interest due to their enhanced bioavailability, tunability and specificity. Although these properties make them attractive hit-candidates in early stage drug discovery, knowing which peptides to pursue is non‐trivial due to the magnitude of the peptide sequence space. Computational screening approaches show promise in their ability to address the size of this search space but suffer from their inability to accurately interrogate the conformational landscape of peptide macrocycles. We developed an in‐silico compound enumerator that was tasked with populating a conformationally laden peptide virtual library. This library was then used in the search for cardio‐protective agents (that may be administered, reducing tissue damage during reperfusion after ischemia (heart attacks)). Our enumerator successfully generated a library of 15.2 billion compounds, requiring the use of compression algorithms, conformational sampling protocols and management of aggregated compute resources in the context of a local cluster. In the absence of experimental biophysical data, we performed biased sampling during alchemical molecular dynamics simulations in order to observe cyclophilin‐D perturbation by cyclosporine A and its mitochondrial targeted analogue. Reliable intermediate state averaging through a WHAM analysis of the biased dynamic pulling simulations confirmed that the cardio‐protective activity of cyclosporine A was due to its mitochondrial targeting. Paralleltempered solution molecular dynamics in combination with efficient clustering isolated the essential dynamics of a cyclic peptide scaffold. The rapid enumeration of skeletons from these essential dynamics gave rise to a conformation laden virtual library of all the 15.2 Billion unique cyclic peptides (given the limits on peptide sequence imposed). Analysis of this library showed the exact extent of physicochemical properties covered, relative to the bare scaffold precursor. Molecular docking of a subset of the virtual library against cyclophilin‐D showed significant improvements in affinity to the target (relative to cyclosporine A). The conformation laden virtual library, accessed by our methodology, provided derivatives that were able to make many interactions per peptide with the cyclophilin‐D target. Machine learning methods showed promise in the training of Support Vector Machines for synthetic feasibility prediction for this library. The synergy between enumeration and conformational sampling greatly improves the performance of this library during virtual screening, even when only a subset is used

    Rational Design of Small-Molecule Inhibitors of Protein-Protein Interactions: Application to the Oncogenic c-Myc/Max Interaction

    Get PDF
    Protein-protein interactions (PPIs) constitute an emerging class of targets for pharmaceutical intervention pursued by both industry and academia. Despite their fundamental role in many biological processes and diseases such as cancer, PPIs are still largely underrepresented in today's drug discovery. This dissertation describes novel computational approaches developed to facilitate the discovery/design of small-molecule inhibitors of PPIs, using the oncogenic c-Myc/Max interaction as a case study.First, we critically review current approaches and limitations to the discovery of small-molecule inhibitors of PPIs and we provide examples from the literature.Second, we examine the role of protein flexibility in molecular recognition and binding, and we review recent advances in the application of Elastic Network Models (ENMs) to modeling the global conformational changes of proteins observed upon ligand binding. The agreement between predicted soft modes of motions and structural changes experimentally observed upon ligand binding supports the view that ligand binding is facilitated, if not enabled, by the intrinsic (pre-existing) motions thermally accessible to the protein in the unliganded form.Third, we develop a new method for generating models of the bioactive conformations of molecules in the absence of protein structure, by identifying a set of conformations (from different molecules) that are most mutually similar in terms of both their shape and chemical features. We show how to solve the problem using an Integer Linear Programming formulation of the maximum-edge weight clique problem. In addition, we present the application of the method to known c-Myc/Max inhibitors.Fourth, we propose an innovative methodology for molecular mimicry design. We show how the structure of the c-Myc/Max complex was exploited to designing compounds that mimic the binding interactions that Max makes with the leucine zipper domain of c-Myc.In summary, the approaches described in this dissertation constitute important contributions to the fields of computational biology and computer-aided drug discovery, which combine biophysical insights and computational methods to expedite the discovery of novel inhibitors of PPIs

    Recent Trends in In-silico Drug Discovery

    Get PDF
    A Drug designing is a process in which new leads (potential drugs) are discovered which have therapeutic benefits in diseased condition. With development of various computational tools and availability of databases (having information about 3D structure of various molecules) discovery of drugs became comparatively, a faster process. The two major drug development methods are structure based drug designing and ligand based drug designing. Structure based methods try to make predictions based on three dimensional structure of the target molecules. The major approach of structure based drug designing is Molecular docking, a method based on several sampling algorithms and scoring functions. Docking can be performed in several ways depending upon whether ligand and receptors are rigid or flexible. Hotspot grafting, is another method of drug designing. It is preferred when the structure of a native binding protein and target protein complex is available and the hotspots on the interface are known. In absence of information of three Dimensional structure of target molecule, Ligand based methods are used. Two common methods used in ligand based drug designing are Pharmacophore modelling and QSAR. Pharmacophore modelling explains only essential features of an active ligand whereas QSAR model determines effect of certain property on activity of ligand. Fragment based drug designing is a de novo approach of building new lead compounds using fragments within the active site of the protein. All the candidate leads obtained by various drug designing method need to satisfy ADMET properties for its development as a drug. In-silico ADMET prediction tools have made ADMET profiling an easier and faster process. In this review, various softwares available for drug designing and ADMET property predictions have also been listed
    corecore