273 research outputs found

    Exploring the chemical space of protein-protein interaction inhibitors through machine learning

    Get PDF
    Although protein-protein interactions (PPIs) have emerged as the basis of potential new therapeutic approaches, targeting intracellular PPIs with small molecule inhibitors is conventionally considered highly challenging. Driven by increasing research efforts, success rates have increased significantly in recent years. In this study, we analyze the physicochemical properties of 9351 non-redundant inhibitors present in the iPPI-DB and TIMBAL databases to define a computational model for active compounds acting against PPI targets. Principle component analysis (PCA) and k-means clustering were used to identify plausible PPI targets in regions of interest in the active group in the chemical space between active and inactive iPPI compounds. Notably, the uniquely defined active group exhibited distinct differences in activity compared with other active compounds. These results demonstrate that active compounds with regions of interest in the chemical space may be expected to provide insights into potential PPI inhibitors for particular protein targets.ope

    Concepts to Interfere with Protein-Protein Complex Formations: Data Analysis, Structural Evidence and Strategies for Finding Small Molecule Modulators

    Get PDF
    (1) Analyzing protein-protein interactions at the atomic level is critical for our understanding of the principles governing the interactions involved in protein-protein recognition. For this purpose descriptors explaining the nature of different protein-protein complexes are desirable. In this work, we introduce Epic Protein Interface Classification (EPIC) as a framework handling the preparation, processing, and analysis of protein-protein complexes for classification with machine learning algorithms. We applied four different machine learning algorithms: Support Vector Machines (SVM), C4.5 Decision Trees, K Nearest Neighbors (KNN), and Naïve Bayes (NB) algorithm in combination with three feature selection methods, Filter (Relief F), Wrapper, and Genetic Algorithms (GA) to extract discriminating features from the protein-protein complexes. To compare protein-protein complexes to each other, we represented the physicochemical characteristics of their interfaces in four different ways, using two different atomic contact vectors (ACVs), DrugScore pair potential vectors (DPV) and SFCscore descriptor vectors (SDV). We classified two different datasets: (A) 172 protein-protein complexes comprising 96 monomers, forming contacts enforced by the crystallographic packing environment (crystal contacts), and 76 biologically functional homodimer complexes; (B) 345 protein-protein complexes containing 147 permanent complexes and 198 transient complexes. We were able to classify up to 94.8% of the packing enforced/functional and up to 93.6% of the permanent/transient complexes correctly. Furthermore, we were able to extract relevant features from the different protein-protein complexes and introduce an approach for scoring the importance of the extracted features. (2) Since protein-protein interactions play pivotal role in the communication on the molecular level in virtually every biological system and process, the search and design for modulators of such interactions is of utmost interest. In recent years many inhibitors for specific protein-protein interactions have been developed, however, in only a few cases, small and druglike molecules are able to interfere the complex formation of proteins. On the other hand, there a several small molecules known to modulate protein-protein interactions by means of stabilizing an already assembled complex. To achieve this goal, a ligand is binding to a pocket, which is located rim-exposed at the interface of the interacting proteins, e.g. as the phytotoxin Fusicoccin, which stabilizes the interaction of plant H+-ATPase and 14-3-3 protein by nearly a factor of 100. To suggest alternative leads, we performed a virtual screening campaign to discover new molecules putatively stabilizing this complex. Furthermore, we screen a dataset of 198 transient recognition protein-protein complexes for cavities, which are located rim-exposed at their interfaces. We provide evidence for high similarity between such rim-exposed cavities and usual ligand accommodating active sites of enzymes. This analysis suggests that rim-exposed cavities at protein-protein interfaces are druggable targets. Therefore, the principle of stabilizing protein-protein interactions seems to be a promising alternative to the approach of the competitive inhibition of such interactions by small molecules. (3) AffinDB is a database of affinity data for structurally resolved protein-ligand complexes from the PDB. It is freely accessible at http://www.agklebe.de/affinity. Affinity data are collected from the scientific literature, both from primary sources describing the original experimental work of affinity determination and from secondary references which report affinity values determined by others. AffinDB currently contains over 730 affinity entries covering more than 450 different protein-ligand complexes. Besides the affinity value, PDB summary information and additional data are provided, including the experimental conditions of the affinity measurement (if available in the corresponding reference); 2D drawing, SMILES code, and molecular weight of the ligand; links to other databases, and bibliographic information. AffinDB can be queried by PDB code or by any combination of affinity range, temperature and pH-value of the measurement, ligand molecular weight, and publication data (author, journal, year). Search results can be saved as tabular reports in text files. The database is supposed to be a valuable resource for researchers interested in biomolecular recognition and the development of tools for correlating structural data with affinities, as needed, for example, in structure-based drug design

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Concepts to Interfere with Protein-Protein Complex Formations: Data Analysis, Structural Evidence and Strategies for Finding Small Molecule Modulators

    Get PDF
    (1) Analyzing protein-protein interactions at the atomic level is critical for our understanding of the principles governing the interactions involved in protein-protein recognition. For this purpose descriptors explaining the nature of different protein-protein complexes are desirable. In this work, we introduce Epic Protein Interface Classification (EPIC) as a framework handling the preparation, processing, and analysis of protein-protein complexes for classification with machine learning algorithms. We applied four different machine learning algorithms: Support Vector Machines (SVM), C4.5 Decision Trees, K Nearest Neighbors (KNN), and Naïve Bayes (NB) algorithm in combination with three feature selection methods, Filter (Relief F), Wrapper, and Genetic Algorithms (GA) to extract discriminating features from the protein-protein complexes. To compare protein-protein complexes to each other, we represented the physicochemical characteristics of their interfaces in four different ways, using two different atomic contact vectors (ACVs), DrugScore pair potential vectors (DPV) and SFCscore descriptor vectors (SDV). We classified two different datasets: (A) 172 protein-protein complexes comprising 96 monomers, forming contacts enforced by the crystallographic packing environment (crystal contacts), and 76 biologically functional homodimer complexes; (B) 345 protein-protein complexes containing 147 permanent complexes and 198 transient complexes. We were able to classify up to 94.8% of the packing enforced/functional and up to 93.6% of the permanent/transient complexes correctly. Furthermore, we were able to extract relevant features from the different protein-protein complexes and introduce an approach for scoring the importance of the extracted features. (2) Since protein-protein interactions play pivotal role in the communication on the molecular level in virtually every biological system and process, the search and design for modulators of such interactions is of utmost interest. In recent years many inhibitors for specific protein-protein interactions have been developed, however, in only a few cases, small and druglike molecules are able to interfere the complex formation of proteins. On the other hand, there a several small molecules known to modulate protein-protein interactions by means of stabilizing an already assembled complex. To achieve this goal, a ligand is binding to a pocket, which is located rim-exposed at the interface of the interacting proteins, e.g. as the phytotoxin Fusicoccin, which stabilizes the interaction of plant H+-ATPase and 14-3-3 protein by nearly a factor of 100. To suggest alternative leads, we performed a virtual screening campaign to discover new molecules putatively stabilizing this complex. Furthermore, we screen a dataset of 198 transient recognition protein-protein complexes for cavities, which are located rim-exposed at their interfaces. We provide evidence for high similarity between such rim-exposed cavities and usual ligand accommodating active sites of enzymes. This analysis suggests that rim-exposed cavities at protein-protein interfaces are druggable targets. Therefore, the principle of stabilizing protein-protein interactions seems to be a promising alternative to the approach of the competitive inhibition of such interactions by small molecules. (3) AffinDB is a database of affinity data for structurally resolved protein-ligand complexes from the PDB. It is freely accessible at http://www.agklebe.de/affinity. Affinity data are collected from the scientific literature, both from primary sources describing the original experimental work of affinity determination and from secondary references which report affinity values determined by others. AffinDB currently contains over 730 affinity entries covering more than 450 different protein-ligand complexes. Besides the affinity value, PDB summary information and additional data are provided, including the experimental conditions of the affinity measurement (if available in the corresponding reference); 2D drawing, SMILES code, and molecular weight of the ligand; links to other databases, and bibliographic information. AffinDB can be queried by PDB code or by any combination of affinity range, temperature and pH-value of the measurement, ligand molecular weight, and publication data (author, journal, year). Search results can be saved as tabular reports in text files. The database is supposed to be a valuable resource for researchers interested in biomolecular recognition and the development of tools for correlating structural data with affinities, as needed, for example, in structure-based drug design

    Field-based Proteochemometric Models Derived from 3D Protein Structures : A Novel Approach to Visualize Affinity and Selectivity Features

    Get PDF
    Designing drugs that are selective is crucial in pharmaceutical research to avoid unwanted side effects. To decipher selectivity of drug targets, computational approaches that utilize the sequence and structural information of the protein binding pockets are frequently exploited. In addition to methods that rely only on protein information, quantitative approaches such as proteochemometrics (PCM) use the combination of protein and ligand descriptions to derive quantitative relationships with binding affinity. PCM aims to explain cross-interactions between the different proteins and ligands, hence facilitating our understanding of selectivity. The main goal of this dissertation is to develop and apply field-based PCM to improve the understanding of relevant molecular interactions through visual illustrations. Field-based description that depends on the 3D structural information of proteins enhances visual interpretability of PCM models relative to the frequently used sequence-based descriptors for proteins. In these field-based PCM studies, knowledge-based fields that explain polarity and lipophilicity of the binding pockets and WaterMap-derived fields that elucidate the positions and energetics of water molecules are used together with the various 2D / 3D ligand descriptors to investigate the selectivity profiles of kinases and serine proteases. Field-based PCM is first applied to protein kinases, for which designing selective inhibitors has always been a challenge, owing to their highly similar ATP binding pockets. Our studies show that the method could be successfully applied to pinpoint the regions influencing the binding affinity and selectivity of kinases. As an extension of the initial studies conducted on a set of 50 kinases and 80 inhibitors, field-based PCM was used to build classification models on a large dataset (95 kinases and 1572 inhibitors) to distinguish active from inactive ligands. The prediction of the bioactivities of external test set compounds or kinases with accuracies over 80% (Matthews correlation coefficient, MCC: ~0.50) and area under the ROC curve (AUC) above 0.8 together with the visual inspection of the regions promoting activity demonstrates the ability of field-based PCM to generate both predictive and visually interpretable models. Further, the application of this method to serine proteases provides an overview of the sub-pocket specificities, which is crucial for inhibitor design. Additionally, alignment-independent Zernike descriptors derived from fields were used in PCM models to study the influence of protein superimpositions on field comparisons and subsequent PCM modelling.Lääketutkimuksessa selektiivisten lääkeaineiden suunnittelu on ratkaisevan tärkeää haittavaikutusten välttämiseksi. Kohdeselektiivisyyden selvittämiseen käytetään usein tietokoneavusteisia menetelmiä, jotka hyödyntävät proteiinien sitoutumiskohtien sekvenssi- ja rakennetietoja. Proteiinilähtöisten menetelmien lisäksi kvantitatiiviset menetelmät kuten proteokemometria (proteochemometrics, PCM) yhdistävät sekä proteiinin että ligandin tietoja muodostaessaan kvantitatiivisen suhteen sitoutumisaffiniteettiin. PCM pyrkii selittämään eri proteiinien ja ligandien vuorovaikutuksia ja näin auttaa ymmärtämään selektiivisyyttä. Väitöstutkimuksen tavoitteena oli kehittää ja hyödyntää kenttäpohjaista proteokemometriaa, joka auttaa ymmärtämään relevantteja molekyylitasoisia vuorovaikutuksia visuaalisen esitystavan kautta. Proteiinin kolmiulotteisesta rakenteesta riippuva kenttäpohjainen kuvaus helpottaa PCM-mallien tulkintaa, etenkin usein käytettyihin sekvenssipohjaisiin kuvauksiin verrattuna. Näissä kenttäpohjaisissa PCM-mallinnuksissa käytettiin tietoperustaisia sitoutumistaskun polaarisuutta ja lipofiilisyyttä kuvaavia kenttiä ja WaterMap-ohjelman tuottamia vesimolekyylien sijaintia ja energiaa havainnollistavia kenttiä yhdessä lukuisten ligandia kuvaavien 2D- ja 3D-deskriptorien kanssa. Malleja sovellettiin kinaasien ja seriiniproteaasien selektiivisyysprofiilien tutkimukseen. Tutkimuksen ensimmäisessä osassa kenttäpohjaista PCM-mallinnusta sovellettiin proteiinikinaaseihin, joille selektiivisten inhibiittorien suunnittelu on haastavaa samankaltaisten ATP sitoutumistaskujen takia. Tutkimuksemme osoitti menetelmän soveltuvan kinaasien sitoutumisaffiniteettia ja selektiivisyyttä ohjaavien alueiden osoittamiseen. Jatkona 50 kinaasia ja 80 inhibiittoria käsittäneelle alkuperäiselle tutkimukselle rakensimme kenttäpohjaisia PCM-luokittelumalleja suuremmalle joukolle kinaaseja (95) ja inhibiittoreita (1572) erotellaksemme aktiiviset ja inaktiiviset ligandit toisistaan. Ulkoisen testiyhdiste- tai testikinaasijoukon bioaktiivisuuksien ennustaminen yli 80 % tarkkuudella (Matthews korrelaatiokerroin, MCC noin 0,50) ja ROC-käyrän alle jäävä ala (AUC) yli 0,8 yhdessä aktiivisuutta tukevien alueiden visuaalisen tarkastelun kanssa osoittivat kenttäpohjaisen PCM:n pystyvän tuottamaan sekä ennustavia että visuaalisesti ymmärrettäviä malleja. Tutkimuksen toisessa osassa metodin soveltaminen seriiniproteaaseihin tuotti yleisnäkemyksen sitoutumistaskun eri osien spesifisyyksistä, mikä on ensiarvoisen tärkeää inhibiittorien suunnittelulle. Lisäksi kentistä johdettuja, proteiinien päällekkäinasettelusta riippumattomia Zernike-deskriptoreita hyödynnettiin PCM-malleissa arvioidaksemme proteiinien päällekkäinasettelun vaikutusta kenttien vertailuun ja sen jälkeiseen PCM-mallinnukseen

    Data-Driven Rational Drug Design

    Get PDF
    Vast amount of experimental data in structural biology has been generated, collected and accumulated in the last few decades. This rich dataset is an invaluable mine of knowledge, from which deep insights can be obtained and practical applications can be developed. To achieve that goal, we must be able to manage such Big Data\u27\u27 in science and investigate them expertly. Molecular docking is a field that can prominently make use of the large structural biology dataset. As an important component of rational drug design, molecular docking is used to perform large-scale screening of putative associations between small organic molecules and their pharmacologically relevant protein targets. Given a small molecule (ligand), a molecular docking program simulates its interaction with the target protein, and reports the probable conformation of the protein-ligand complex, and the relative binding affinity compared against other candidate ligands. This dissertation collects my contributions in several aspects of molecular docking. My early contribution focused on developing a novel metric to quantify the structural similarity between two protein-ligand complexes. Benchmarks show that my metric addressed several issues associated with the conventional metric. Furthermore, I extended the functionality of this metric to cross different systems, effectively utilizing the data at the proteome level. After developing the novel metric, I formulated a scoring function that can extract the biological information of the complex, integrate it with the physics components, and finally enhance the performance. Through collaboration, I implemented my model into an ultra-fast, adaptive program, which can take advantage of a range of modern parallel architectures and handle the demanding data processing tasks in large scale molecular docking applications

    Virtual compound screening and SAR analysis: method development and practical applications in the design of new serine and cysteine protease inhibitors

    Get PDF
    Virtual screening is an important tool in drug discovery that uses different computational methods to screen chemical databases for the identification of possible drug candidates. Most virtual screening methodologies are knowledge driven where the availability of information on either the nature of the target binding pocket or the type of ligand that is expect to bind is essential. In this regard, the information contained in X-ray crystal structures of protein-ligand complexes provides a detailed insight into the interactions between the protein and the ligand and opens the opportunity for further understanding of drug action and structure activity relationships at molecular level. Protein-ligand interaction information can be utilized to introduce target-specific interaction-based constraints in the design of focused combinatorial libraries. It can also be directly transformed into structural interaction fingerprints and can be applied in virtual screening to analyze docking studies or filter compounds. However, the integration of protein-ligand interaction information into two-dimensional compound similarity searching is not fully explored. Therefore, novel methods are still required to efficiently utilize protein-ligand interaction information in two-dimensional ligand similarity searching. Furthermore, application of protein-ligand interaction information in the interpretation of SARs at the ligand level needs further exploration. Thus, utilization of three-dimensional protein ligand interaction information in virtual screening and SAR analysis was the major aim of this thesis. The thesis is presented in two major parts. In the first part, utilization of three-dimensional protein-ligand interaction information for the development of a new hybrid virtual screening method and analysis of the nature of SARs in analog series at molecular level is presented. The second part of the thesis is focused on the application of different virtual screening methods for the identification of new cysteine and membrane-bound serine proteases inhibitors. In addition, molecular modeling studies were also applied to analyze the binding mode of structurally complex cyclic peptide inhibitors

    The Murine Accessory Olfactory Bulb as a Model Chemosensory System: Experimental and Computational Analysis of Chemosensory Representations

    Get PDF
    A common challenge across sensory processing modalities is forming meaningful associations between the neural responses and the outside world. These neural representations of the world must then be integrated across different sensory systems contributing to each individuals perceptual experience. While there has been considerable study of sensory representations in the visual system of humans and multiple model organisms, other sensory domains, including olfaction, are less well understood. In this thesis, I set out to better understand the sensory representations of the mouse accessory olfactory system (AOS), a part of the olfactory system. The mouse AOS, our model chemosensory system, comprises peripheral vomeronasal sensory neurons (VSNs), the accessory olfactory bulb (AOB), and downstream effectors. Our work describes the neural representations of multiple sensory inputs in the AOS, specifically the representations of odorants in high dimensional chemical sensory space in the AOB, and how these representations are shaped by interactions within the circuit. Given the complex nature of olfactory chemosensory representations, the features of our model system may give new perspectives on the neural representation of the outside world. In a neural representation of olfactory information, both the interactions between each receptor and odor compounds as well as the circuit mediated interactions could potentially affect the neural representations of the outside world. The initial neural response comprises component interactions between each receptor and the odor; chemical signals must interact with physical receptors. However, chemosensory processing, such as olfaction, requires interpreting a large variety of potentially overlapping chemical cues from the environment with only a finite number of receptor types. This means that each chemical cue does not necessarily activate only one receptor type or region of the circuit, but rather the cue is likely to be represented by multiple receptor and odor component interactions. Also, the component parts of odors may be processed differently when presented in isolation versus in a more complex mixture, thus allowing the response to a particular odor to vary with chemical context. Moreover, once these component representations exist, interactions within the neural circuit may further shape these responses. For example, one might expect component parts of a complex odor to specifically inhibit other component parts. In the case of the accessory olfactory system this inhibition could be at the receptor level or at the level of the sensory representation in the accessory olfactory bulb (AOB). In Chapter 3, I describe the overall organization of chemosensory representations in the accessory olfactory bulb (AOB), which is found to be a modular map in which the primary associations of functional sensory responses are spatially organized relative to one another. I find these primary associations are condensations of the first order sensory neuron axon terminals, which form population response pooling structures called glomeruli. In these glomeruli, similar response types from those sensory neurons expressing one of the approximately 300 receptor types in the vomeronasal organ (VNO) co-converge. One purpose of converging inputs of neurons expressing the same receptor is likely to minimize noise, and I demonstrate that pooling of like receptor responses into glomeruli does increase neural signal relative to noise. However, I also observed a modular organization among and between glomeruli in which certain types or patterns of chemosensory responses are always spatially adjacent to one another, while others are much farther apart than would be expected by chance. I found this spatial modularity for both ethological stimuli (urine collected from conspecifics with widely divergent physiological endocrine status) and individual sulfated steroids. In Chapter 4, I explore the consequences of changing sensory context, specifically the presentation of multiple compounds, and the role that inhibition plays in the neural representation of the sensory stimuli. First, I tested whether the circuit responds differently to demands to represent a single odor than to demands to represent multiple odors by using odors that activate glomeruli both inside and outside of modules. I found that responses to mixtures rapidly diverge from the responses of individual component parts. Moreover, there was an effect of inhibition in modulating the response to preferred stimuli in all glomeruli. However, initial analysis of one type of pregnanolone responsive glomeruli demonstrated that the divergent response to mixtures in this type of glomerulus was not mediated by inhibition at the glomerular level, but was rather attributable to bottom-up effects from the interactions of multiple ligands with chemosensory receptors in the VNO. Nonetheless, I also demonstrated that in the AOB, the axon terminals of the same sensory neurons (glomeruli) are organized into modules that allow for feedback inhibition. Significant ionotropic glutamate receptor signal modulation was observed within modules, demonstrating that there are inhibition mediated effects in the representation of complex mixtures when glomeruli are co-locally arranged. Specifically, at both the level of the VSNs and also in AOB glomeruli, the response to allopregnanolone sulfate is inhibited by co-presentation with estradiol sulfate. This both significantly increases the relative representation of estradiol sulfate and shifts representation of allopregnanolone primarily within modules. These types of context dependent interactions depend on the spatial organization described in Chapter 3 as well as mixture context, and have the potential to optimize the representation of some chemical cues in a context specific manner

    Computational strategies to include protein flexibility in Ligand Docking and Virtual Screening

    Get PDF
    The dynamic character of proteins strongly influences biomolecular recognition mechanisms. With the development of the main models of ligand recognition (lock-and-key, induced fit, conformational selection theories), the role of protein plasticity has become increasingly relevant. In particular, major structural changes concerning large deviations of protein backbones, and slight movements such as side chain rotations are now carefully considered in drug discovery and development. It is of great interest to identify multiple protein conformations as preliminary step in a screening campaign. Protein flexibility has been widely investigated, in terms of both local and global motions, in two diverse biological systems. On one side, Replica Exchange Molecular Dynamics has been exploited as enhanced sampling method to collect multiple conformations of Lactate Dehydrogenase A (LDHA), an emerging anticancer target. The aim of this project was the development of an Ensemble-based Virtual Screening protocol, in order to find novel potent inhibitors. On the other side, a preliminary study concerning the local flexibility of Opioid Receptors has been carried out through ALiBERO approach, an iterative method based on Elastic Network-Normal Mode Analysis and Monte Carlo sampling. Comparison of the Virtual Screening performances by using single or multiple conformations confirmed that the inclusion of protein flexibility in screening protocols has a positive effect on the probability to early recognize novel or known active compounds
    corecore