299 research outputs found

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Machine Learning Approaches for the Prioritisation of Cardiovascular Disease Genes Following Genome- wide Association Study

    Get PDF
    Genome-wide association studies (GWAS) have revealed thousands of genetic loci, establishing itself as a valuable method for unravelling the complex biology of many diseases. As GWAS has grown in size and improved in study design to detect effects, identifying real causal signals, disentangling from other highly correlated markers associated by linkage disequilibrium (LD) remains challenging. This has severely limited GWAS findings and brought the method’s value into question. Although thousands of disease susceptibility loci have been reported, causal variants and genes at these loci remain elusive. Post-GWAS analysis aims to dissect the heterogeneity of variant and gene signals. In recent years, machine learning (ML) models have been developed for post-GWAS prioritisation. ML models have ranged from using logistic regression to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models (i.e., neural networks). When combined with functional validation, these methods have shown important translational insights, providing a strong evidence-based approach to direct post-GWAS research. However, ML approaches are in their infancy across biological applications, and as they continue to evolve an evaluation of their robustness for GWAS prioritisation is needed. Here, I investigate the landscape of ML across: selected models, input features, bias risk, and output model performance, with a focus on building a prioritisation framework that is applied to blood pressure GWAS results and tested on re-application to blood lipid traits

    Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey from Precision to Interpretability

    Full text link
    The integration of Artificial Intelligence (AI) into the field of drug discovery has been a growing area of interdisciplinary scientific research. However, conventional AI models are heavily limited in handling complex biomedical structures (such as 2D or 3D protein and molecule structures) and providing interpretations for outputs, which hinders their practical application. As of late, Graph Machine Learning (GML) has gained considerable attention for its exceptional ability to model graph-structured biomedical data and investigate their properties and functional relationships. Despite extensive efforts, GML methods still suffer from several deficiencies, such as the limited ability to handle supervision sparsity and provide interpretability in learning and inference processes, and their ineffectiveness in utilising relevant domain knowledge. In response, recent studies have proposed integrating external biomedical knowledge into the GML pipeline to realise more precise and interpretable drug discovery with limited training instances. However, a systematic definition for this burgeoning research direction is yet to be established. This survey presents a comprehensive overview of long-standing drug discovery principles, provides the foundational concepts and cutting-edge techniques for graph-structured data and knowledge databases, and formally summarises Knowledge-augmented Graph Machine Learning (KaGML) for drug discovery. A thorough review of related KaGML works, collected following a carefully designed search methodology, are organised into four categories following a novel-defined taxonomy. To facilitate research in this promptly emerging field, we also share collected practical resources that are valuable for intelligent drug discovery and provide an in-depth discussion of the potential avenues for future advancements

    Computational method development for drug discovery

    Get PDF
    Protein-small molecule interactions play a central role in various aspects of the structural and functional organization of the cell and are therefore integral for drug discovery. The most comprehensive structural characterization of small molecule binding sites is provided by X-ray crystallography. However, it is often time-consuming and challenging to perform direct experimental analysis. Therefore, it is necessary to have computational methods that can predict binding site locations on unbound structures with accuracy close to that provided by X-ray crystallography. This thesis details four projects which involve the development of a fragment benchmark set, evaluation of allosteric sites in G Protein-Coupled Receptors (GPCRs), computational modeling of binding pocket dynamics, and the development of an Application Program Interface (API) framework for High-Performance Computing (HPC) centers. The first project provides a benchmark set for testing hot spot identification methods, emphasizing application to fragment-based drug discovery. Using the solvent mapping server, FTMap, which finds small molecule binding hot spots on proteins, we compared our benchmark set to an existing benchmark set that with a different method of construction. The second project details the effort to identify allosteric binding sites on GPCRs. We demonstrate that FTMap successfully identifies structurally determined allosteric sites in bound crystal structures and unbound structures. The project was further expanded to evaluate the conservation of allosteric sites across different classes, families, and types of GPCRs. The third project provides a structure-based analysis of cryptic site openings. Cryptic sites are pockets formed in ligand-bound proteins but not observed in unbound protein structures. Through analysis of crystal structures supplemented by molecular dynamics (MD) with enhanced sampling techniques, it was shown that cryptic sites can be grouped into three types: 1) “genuine” cryptic sites, which do not form without ligand binding, 2) spontaneously forming cryptic sites, and 3) cryptic sites impacted by mutations or off-site ligand binding. The fourth project presents an API framework for increasing the accessibility of HPC resources

    GGL-PPI: Geometric Graph Learning to Predict Mutation-Induced Binding Free Energy Changes

    Full text link
    Protein-protein interactions (PPIs) are critical for various biological processes, and understanding their dynamics is essential for decoding molecular mechanisms and advancing fields such as cancer research and drug discovery. Mutations in PPIs can disrupt protein binding affinity and lead to functional changes and disease. Predicting the impact of mutations on binding affinity is valuable but experimentally challenging. Computational methods, including physics-based and machine learning-based approaches, have been developed to address this challenge. Machine learning-based methods, fueled by extensive PPI datasets such as Ab-Bind, PINT, SKEMPI, and others, have shown promise in predicting binding affinity changes. However, accurate predictions and generalization of these models across different datasets remain challenging. Geometric graph learning has emerged as a powerful approach, combining graph theory and machine learning, to capture structural features of biomolecules. We present GGL-PPI, a novel method that integrates geometric graph learning and machine learning to predict mutation-induced binding free energy changes. GGL-PPI leverages atom-level graph coloring and multi-scale weighted colored geometric subgraphs to extract informative features, demonstrating superior performance on three validation datasets, namely AB-Bind, SKEMPI 1.0, and SKEMPI 2.0 datasets. Evaluation on a blind test set highlights the unbiased predictions of GGL-PPI for both direct and reverse mutations. The findings underscore the potential of GGL-PPI in accurately predicting binding free energy changes, contributing to our understanding of PPIs and aiding drug design efforts

    Systems biology of degenerative diseases

    Get PDF

    Machine Learning for Kinase Drug Discovery

    Get PDF
    Cancer is one of the major public health issues, causing several million losses every year. Although anti-cancer drugs have been developed and are globally administered, mild to severe side effects are known to occur during treatment. Computer-aided drug discovery has become a cornerstone for unveiling treatments of existing as well as emerging diseases. Computational methods aim to not only speed up the drug design process, but to also reduce time-consuming, costly experiments, as well as in vivo animal testing. In this context, over the last decade especially, deep learning began to play a prominent role in the prediction of molecular activity, property and toxicity. However, there are still major challenges when applying deep learning models in drug discovery. Those challenges include data scarcity for physicochemical tasks, the difficulty of interpreting the prediction made by deep neural networks, and the necessity of open-source and robust workflows to ensure reproducibility and reusability. In this thesis, after reviewing the state-of-the-art in deep learning applied to virtual screening, we address the previously mentioned challenges as follows: Regarding data scarcity in the context of deep learning applied to small molecules, we developed data augmentation techniques based on the SMILES encoding. This linear string notation enumerates the atoms present in a compound by following a path along the molecule graph. Multiplicity of SMILES for a single compound can be reached by traversing the graph using different paths. We applied the developed augmentation techniques to three different deep learning models, including convolutional and recurrent neural networks, and to four property and activity data sets. The results show that augmentation improves the model accuracy independently of the deep learning model, as well as of the data set size. Moreover, we computed the uncertainty of a model by using augmentation at inference time. In this regard, we have shown that the more confident the model is in its prediction, the smaller is the error, implying that a given prediction can be trusted and is close to the target value. The software and associated documentation allows making predictions for novel compounds and have been made freely available. Trusting predictions blindly from algorithms may have serious consequences in areas of healthcare. In this context, better understanding how a neural network classifies a compound based on its input features is highly beneficial by helping to de-risk and optimize compounds. In this research project, we decomposed the inner layers of a deep neural network to identify the toxic substructures, the toxicophores, of a compound that led to the toxicity classification. Using molecular fingerprints —vectors that indicate the presence or absence of a particular atomic environment —we were able to map a toxicity score to each of these substructures. Moreover, we developed a method to visualize in 2D the toxicophores within a compound, the so- called cytotoxicity maps, which could be of great use to medicinal chemists in identifying ways to modify molecules to eliminate toxicity. Not only does the deep learning model reach state-of-the-art results, but the identified toxicophores confirm known toxic substructures, as well as expand new potential candidates. In order to speed up the drug discovery process, the accessibility to robust and modular workflows is extremely advantageous. In this context, the fully open-source TeachOpenCADD project was developed. Significant tasks in both cheminformatics and bioinformatics are implemented in a pedagogical fashion, allowing the material to be used for teaching as well as the starting point for novel research. In this framework, a special pipeline is dedicated to kinases, a family of proteins which are known to be involved in diseases such as cancer. The aim is to gain insights into off-targets, i.e. proteins that are unintentionally affected by a compound, and that can cause adverse effects in treatments. Four measures of kinase similarity are implemented, taking into account sequence, and structural information, as well as protein-ligand interaction, and ligand profiling data. The workflow provides clustering of a set of kinases, which can be further analyzed to understand off-target effects of inhibitors. Results show that analyzing kinases using several perspectives is crucial for the insight into off-target prediction, and gaining a global perspective of the kinome. These novel methods can be exploited in the discovery of new drugs, and more specifically diseases involved in the dysregulation of kinases, such as cancer
    corecore