7 research outputs found

    Elucidating the druggability of the human proteome with eFindSite

    Get PDF
    © 2019, Springer Nature Switzerland AG. Identifying the viability of protein targets is one of the preliminary steps of drug discovery. Determining the ability of a protein to bind drugs in order to modulate its function, termed the druggability, requires a non-trivial amount of time and resources. Inability to properly measure druggability has accounted for a significant portion of failures in drug discovery. This problem is only further exacerbated by the large sample space of proteins involved in human diseases. With these barriers, the druggability space within the human proteome remains unexplored and has made it difficult to develop drugs for numerous diseases. Hence, we present a new feature developed in eFindSite that employs supervised machine learning to predict the druggability of a given protein. Benchmarking calculations against the Non-Redundant data set of Druggable and Less Druggable binding sites demonstrate that an AUC for druggability prediction with eFindSite is as high as 0.88. With eFindSite, we elucidated the human druggability space to be 10,191 proteins. Considering the disease space from the Open Targets Platform and excluding already known targets from the predicted data set reveal 2731 potentially novel therapeutic targets. eFindSite is freely available as a stand-alone software at https://github.com/michal-brylinski/efindsite

    BionoiNet: Ligand-binding site classification with off-the-shelf deep neural network

    Get PDF
    © The 2020 Author(s). Published by Oxford University Press. All rights reserved. Motivation: Fast and accurate classification of ligand-binding sites in proteins with respect to the class of binding molecules is invaluable not only to the automatic functional annotation of large datasets of protein structures but also to projects in protein evolution, protein engineering and drug development. Deep learning techniques, which have already been successfully applied to address challenging problems across various fields, are inherently suitable to classify ligand-binding pockets. Our goal is to demonstrate that off-the-shelf deep learning models can be employed with minimum development effort to recognize nucleotide-and heme-binding sites with a comparable accuracy to highly specialized, voxel-based methods. Results: We developed BionoiNet, a new deep learning-based framework implementing a popular ResNet model for image classification. BionoiNet first transforms the molecular structures of ligand-binding sites to 2D Voronoi diagrams, which are then used as the input to a pretrained convolutional neural network classifier. The ResNet model generalizes well to unseen data achieving the accuracy of 85.6% for nucleotide-and 91.3% for heme-binding pockets. BionoiNet also computes significance scores of pocket atoms, called BionoiScores, to provide meaningful insights into their interactions with ligand molecules. BionoiNet is a lightweight alternative to computationally expensive 3D architectures

    A structural biology community assessment of AlphaFold2 applications

    Get PDF
    Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research

    A review on machine learning approaches and trends in drug discovery

    Get PDF
    Abstract: Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.Instituto de Salud Carlos III; PI17/01826Instituto de Salud Carlos III; PI17/01561Xunta de Galicia; Ref. ED431D 2017/16Xunta de Galicia; Ref. ED431D 2017/23Xunta de Galicia; Ref. ED431C 2018/4

    Development and application of novel bioinformatics tools for protein function prediction

    Get PDF
    Pearson Correlation Coefficient and provides a value between -1 to 1, with -1 being a total negative correlation, 0 is no correlation and 1 is a total positive correlation based on the observed and predicted ligand-binding site residues. Scores of 0.40 to 0.69 are strong positive relationships and 0.70 and higher are strong positive relationships. The downside of MCC is that it does not take into consideration the overall 3D structure of the protein model. Therefore, BDT will also be utilised as this score, which is also scored from -1 to 1, to take into consideration the 3D structure. Both MCC and BDT are only possible to produce when there is an observed (actual) structure available with bound ligands to compare against the predicted structure and hence why MCC and BDT are objective measures of ligand-binding site prediction. The average MCC and BDT score from CASP11 was 0.42 and 0.51, respectively. CASP12 saw the prediction of ligands for low annotation level proteins with no known ligands, demonstrating the potential use of FunFOLD3 in novel protein prediction. The average MCC and BDT score from CASP13 was 0.47 and 0.53. CAFA3 showed FunFOLDQ can be used in the prediction of GO terms, however further refinements are needed to increase specificity of the term predictions. The development option this thesis has explored is the use of docking (preferred orientation of interacting partners) with AutoDock Vina to improve the accuracy of ligand-binding residues by FunFOLD3, as the problem with TBM methods can be that predicted ligand(s) from a similar template will be forced to fit within the ligand-binding pocket. However, with docking, the aim of this method is to predict the preferred orientation of the ligand within the ligand-binding space. Utilisation of docking has also added to the novelty of this research, as different grid box calculations around the ligand-binding space was explored, with varying degrees of success with each grid box calculation. Examples of two CASP targets which had improvements in MCC and BDT score following docking were CASP11 target T0783 (2-C-methyl-D-erythritol 4- phosphate cytidylyltransferase) the MCC and BDT scores by FunFOLD3 were 0.17 and 0.21, respectively. Following docking the MCC and BDT scores increased to 0.63 and 0.45, respectively. CASP13 target T1016 (alpha-ribazole-5'-P phosphatase) had MCC and BDT scores of 0.556 and 0.646 by FunFOLD3, respectively. Following docking the MCC and BDT increased to 0.85 and 0.91, respectively. Lastly, CASP_Commons, a community-wide experiment to find the consensus structures, explored the role of FunFOLD3 with predicting ligands and ligand-binding sites for the novel protein and proteins domains of SARS-CoV-2. The protein domains were non-structural proteins 2, 4 and 6, open reading frames 3a, 6, 7b, 8 and 10, membrane protein and papain�like protease. FunFOLD3 predicted ligands for ten of the protein domains, of which there were a total of 32 targets due to domains being split into smaller residues and subsequent rounds of 3D modelling improvement. Increased understanding of protein structures can provide further insight into a protein’s function, particularly if ligands are bound and identified, an example in this thesis is the prediction of chlorophyll A for non-structural protein 4 (nsp4). Chlorophyll A, like haemoglobin is a porphyrin ring and templates related to nsp4 show a role in blood clotting. Therefore, whilst chlorophyll A might not be the exact ligand, similarities between haemoglobin and chlorophyll A can clearly be determined and assist in understanding the role of nsp4 in the pathology of COVID-19. Identification of GO terms can provide more detailed understanding into the function or functions of proteins and, in proteins with limited annotation information this can assist with comprehending their role. This thesis has focused on improving and developing a function prediction method, FunFOLD3, to better understand the role and function of proteins. The new method of FunFOLD3 which utilises docking will be integrated into the McGuffin group prediction servers and will be benchmarked in subsequent CASP competitions, to critically assess the performance of the developed method
    corecore