12,355 research outputs found

    Computational approaches to virtual screening in human central nervous system therapeutic targets

    Get PDF
    In the past several years of drug design, advanced high-throughput synthetic and analytical chemical technologies are continuously producing a large number of compounds. These large collections of chemical structures have resulted in many public and commercial molecular databases. Thus, the availability of larger data sets provided the opportunity for developing new knowledge mining or virtual screening (VS) methods. Therefore, this research work is motivated by the fact that one of the main interests in the modern drug discovery process is the development of new methods to predict compounds with large therapeutic profiles (multi-targeting activity), which is essential for the discovery of novel drug candidates against complex multifactorial diseases like central nervous system (CNS) disorders. This work aims to advance VS approaches by providing a deeper understanding of the relationship between chemical structure and pharmacological properties and design new fast and robust tools for drug designing against different targets/pathways. To accomplish the defined goals, the first challenge is dealing with big data set of diverse molecular structures to derive a correlation between structures and activity. However, an extendable and a customizable fully automated in-silico Quantitative-Structure Activity Relationship (QSAR) modeling framework was developed in the first phase of this work. QSAR models are computationally fast and powerful tool to screen huge databases of compounds to determine the biological properties of chemical molecules based on their chemical structure. The generated framework reliably implemented a full QSAR modeling pipeline from data preparation to model building and validation. The main distinctive features of the designed framework include a)efficient data curation b) prior estimation of data modelability and, c)an-optimized variable selection methodology that was able to identify the most biologically relevant features responsible for compound activity. Since the underlying principle in QSAR modeling is the assumption that the structures of molecules are mainly responsible for their pharmacological activity, the accuracy of different structural representation approaches to decode molecular structural information largely influence model predictability. However, to find the best approach in QSAR modeling, a comparative analysis of two main categories of molecular representations that included descriptor-based (vector space) and distance-based (metric space) methods was carried out. Results obtained from five QSAR data sets showed that distance-based method was superior to capture the more relevant structural elements for the accurate characterization of molecular properties in highly diverse data sets (remote chemical space regions). This finding further assisted to the development of a novel tool for molecular space visualization to increase the understanding of structure-activity relationships (SAR) in drug discovery projects by exploring the diversity of large heterogeneous chemical data. In the proposed visual approach, four nonlinear DR methods were tested to represent molecules lower dimensionality (2D projected space) on which a non-parametric 2D kernel density estimation (KDE) was applied to map the most likely activity regions (activity surfaces). The analysis of the produced probabilistic surface of molecular activities (PSMAs) from the four datasets showed that these maps have both descriptive and predictive power, thus can be used as a spatial classification model, a tool to perform VS using only structural similarity of molecules. The above QSAR modeling approach was complemented with molecular docking, an approach that predicts the best mode of drug-target interaction. Both approaches were integrated to develop a rational and re-usable polypharmacology-based VS pipeline with improved hits identification rate. For the validation of the developed pipeline, a dual-targeting drug designing model against Parkinson’s disease (PD) was derived to identify novel inhibitors for improving the motor functions of PD patients by enhancing the bioavailability of dopamine and avoiding neurotoxicity. The proposed approach can easily be extended to more complex multi-targeting disease models containing several targets and anti/offtargets to achieve increased efficacy and reduced toxicity in multifactorial diseases like CNS disorders and cancer. This thesis addresses several issues of cheminformatics methods (e.g., molecular structures representation, machine learning, and molecular similarity analysis) to improve and design new computational approaches used in chemical data mining. Moreover, an integrative drug-designing pipeline is designed to improve polypharmacology-based VS approach. This presented methodology can identify the most promising multi-targeting candidates for experimental validation of drug-targets network at the systems biology level in the drug discovery process

    Machine Learning Methodologies for Interpretable Compound Activity Predictions

    Get PDF
    Machine learning (ML) models have gained attention for mining the pharmaceutical data that are currently generated at unprecedented rates and potentially accelerate the discovery of new drugs. The advent of deep learning (DL) has also raised expectations in pharmaceutical research. A central task in drug discovery is the initial search of compounds with desired biological activity. ML algorithms are able to find patterns in compound structures that are related to bioactivity, the so-called structure-activity relationships (SARs). ML-based predictions can complement biological testing to prioritize further experiments. Moreover, insights into model decisions are highly desired for further validation and identification of activity-relevant substructures. However, the interpretation of complex ML models remains essentially prohibitive. This thesis focuses on ML-based predictions of compound activity against multiple biological targets. Single-target and multi-target models are generated for relevant tasks including the prediction of profiling matrices from screening data and the discrimination between weak and strong inhibitors for more than a hundred kinases. Moreover, the relative performance of distinct modeling strategies is systematically analyzed under varying training conditions, and practical guidelines are reported. Since explainable model decisions are a clear requirement for the utility of ML bioactivity models in pharmaceutical research, methods for the interpretation and intuitive visualization of activity predictions from any ML or DL model are introduced. Taken together, this dissertation presents contributions that advance in the application and rationalization of ML models for biological activity and SAR predictions

    Toxicity prediction using multi-disciplinary data integration and novel computational approaches

    Get PDF
    Current predictive tools used for human health assessment of potential chemical hazards rely primarily on either chemical structural information (i.e., cheminformatics) or bioassay data (i.e., bioinformatics). Emerging data sources such as chemical libraries, high throughput assays and health databases offer new possibilities for evaluating chemical toxicity as an integrated system and overcome the limited predictivity of current fragmented efforts; yet, few studies have combined the new data streams. This dissertation tested the hypothesis that integrative computational toxicology approaches drawing upon diverse data sources would improve the prediction and interpretation of chemically induced diseases. First, chemical structures and toxicogenomics data were used to predict hepatotoxicity. Compared with conventional cheminformatics or toxicogenomics models, interpretation was enriched by the chemical and biological insights even though prediction accuracy did not improve. This motivated the second project that developed a novel integrative method, chemical-biological read-across (CBRA), that led to predictive and interpretable models amenable to visualization. CBRA was consistently among the most accurate models on four chemical-biological data sets. It highlighted chemical and biological features for interpretation and the visualizations aided transparency. Third, we developed an integrative workflow that interfaced cheminformatics prediction with pharmacoepidemiology validation using a case study of Stevens Johnson Syndrome (SJS), an adverse drug reaction (ADR) of major public health concern. Cheminformatics models first predicted potential SJS inducers and non-inducers, prioritizing them for subsequent pharmacoepidemiology evaluation, which then confirmed that predicted non-inducers were statistically associated with fewer SJS occurrences. By combining cheminformatics' ability to predict SJS as soon as drug structures are known, and pharmacoepidemiology's statistical rigor, we have provided a universal scheme for more effective study of SJS and other ADRs. Overall, this work demonstrated that integrative approaches could deliver more predictive and interpretable models. These models can then reliably prioritize high risk chemicals for further testing, allowing optimization of testing resources. A broader implication of this research is the growing role we envision for integrative methods that will take advantage of the various emerging data sources.Doctor of Philosoph

    Development and Extension of Cheminformatics Techniques for Integration of Diverse Data to Enhance Drug Discovery

    Get PDF
    The scientific community has fallen headlong into the age of data. With the available crop of information available to scientists growing at an exponential pace, tools to harvest this data and process it into knowledge are needed. This blanket statement is nowhere more true than in drug discovery today. The increasing quantities of bioactivity and protein crystallographic data provide key information capable of improving the state of virtual screening. The CoLiBRI methodology attempts to learn from the large knowledge base of protein-ligand interactions to discover a comprehensive model capable of filtering large libraries very quickly using only a protein structure. This modeling procedure has been greatly expanded to encompass a wide range of descriptor techniques and to use advanced statistical methods of multidimensional mapping. The growth of virtual screening methods (including CoLiBRI) has provided a plethora of options to cheminformaticians with little guidance on their strengths and weaknesses. This oversight in methodology benchmarking should be addressed to reduce the time and effort wasted applying subpar screening protocols. To attend to this issue, we developed a benchmark dataset that will enable a flood of methodology experimentation and validation. The recent generation of gene expression data and cancer cell growth inhibition data enable identification of signatures of cellular resistance. These signatures can be used as validated prognostic markers to guide patient management thereby fueling the personalization of cancer treatment. From the available data, we have derived hypothetical biomarkers of multidrug resistance and a flood of links between gene expression and chemical specific resistance that require experimental validation. The increasing capabilities of cheminformatics techniques require dissemination to the public to produce the greatest impact. We have therefore developed a web portal providing cheminformatics software and models to fuel public drug discovery efforts

    Augmenting Structure/Function Relationship Analysis with Deep Learning for the Classification of Psychoactive Drug Activity at Class A G Protein-Coupled Receptors

    Get PDF
    G protein-coupled receptors (GPCRs) initiate intracellular signaling pathways via interaction with external stimuli. [1-5] Despite sharing similar structure and cellular mechanism, GPCRs participate in a uniquely broad range of physiological functions. [6] Due to the size and functional diversity of the GPCR family, these receptors are a major focus for pharmacological applications. [1,7] Current state-of-the-art pharmacology and toxicology research strategies rely on computational methods to efficiently design highly selective, low toxicity compounds. [9], [10] GPCR-targeting therapeutics are associated with low selectivity resulting in increased risk of adverse effects and toxicity. Psychoactive drugs that are active at Class A GPCRs used in the treatment of schizophrenia and other psychiatric disorders display promiscuous binding behavior linked to chronic toxicity and high-risk adverse effects. [16-18] We hypothesized that using a combination of physiochemical feature engineering with a feedforward neural network, predictive models can be trained for these specific GPCR subgroups that are more efficient and accurate than current state-of-the-art methods.. We combined normal mode analysis with deep learning to create a novel framework for the prediction of Class A GPCR/psychoactive drug interaction activities. Our deep learning classifier results in high classification accuracy (5-HT F1-score = 0.78; DRD F1-score = 0.93) and achieves a 45% reduction in model training time when structure-based feature selection is applied via guidance from an anisotropic network model (ANM). Additionally, we demonstrate the interpretability and application potential of our framework via evaluation of highly clinically relevant Class A GPCR/psychoactive drug interactions guided by our ANM results and deep learning predictions. Our model offers an increased range of applicability as compared to other methods due to accessible data compatibility requirements and low model complexity. While this model can be applied to a multitude of clinical applications, we have presented strong evidence for the impact of machine learning in the development of novel psychiatric therapeutics with improved safety and tolerability

    12th Annual Research Week--Event Proceedings

    Get PDF
    12th Annual Research Week A Celebration of Student Researc

    QSAR models for the (eco-)toxicological characterization and prioritization of emerging pollutants: case studies and potential applications within REACH.

    Get PDF
    Under the European REACH regulation (Registration, Evaluation, Authorisation and Restriction of Chemical substances - (EC) No 1907/2006), there is an urgent need to acquire a large amount of information necessary to assess and manage the potential risk of thousands of industrial chemicals. Meanwhile, REACH aims at reducing animal testing by promoting the intelligent and integrated use of alternative methods, such as in vitro testing and in silico techniques. Among these methods, models based on quantitative structure-activity relationships (QSAR) are useful tools to fill data gaps and to support the hazard and risk assessment of chemicals. The present thesis was performed in the context of the CADASTER Project (CAse studies on the Development and Application of in-Silico Techniques for Environmental hazard and Risk assessment), which aims to integrate in-silico models (e.g. QSARs) in risk assessment procedures, by showing how to increase the use of non-testing information for regulatory decision-making under REACH. The aim of this thesis was the development of QSAR/QSPR models for the characterization of the (eco-)toxicological profile and environmental behaviour of chemical substances of emerging concern. The attention was focused on four classes of compounds studied within the CADASTER project, i.e. brominated flame retardants (BFRs), fragrances, prefluorinated compounds (PFCs) and (benzo)-triazoles (B-TAZs), for which limited amount of experimental data is currently available, especially for the basic endpoints required in regulation for the hazard and risk assessment. Through several case-studies, the present thesis showed how QSAR models can be applied for the optimization of experimental testing as well as to provide useful information for the safety assessment of chemicals and support decision-making. In the first case-study, simple multiple linear regression (MLR) and classification models were developed ad hoc for BFRs and PFCs to predict specific endpoints related to endocrine disrupting (ED) potential (e.g. dioxin-like activity, estrogenic and androgenic receptor binding, interference with thyroxin transport and estradiol metabolism). The analysis of modelling molecular descriptors allowed to highlight some structural features and important structural alerts responsible for increasing specific ED activities. The developed models were applied to screen over 200 BFRs and 33 PFCs without experimental data, and to prioritize the most hazardous chemicals (on the basis of ED potency profile), which have been then suggested to other CADASTER partners in order to focus the experimental testing. In the second case-study, MLR models have been developed, specifically for B-TAZs, for the prediction of three key endpoints required in regulation to assess aquatic toxicity, i.e. acute toxicity in algae (EC50 72h Pseudokirchneriella subcapitata), daphnids (EC50 48h Daphnia magna) and fish (LC50 96h Onchorynchus mykiss). Also in this case, the developed QSARs were applied for screening purposes. Among over 350 B-TAZs lacking experimental data, 20 compounds, which were predicted as toxic (EC(LC)50 64 10 mg/L) or very toxic (EC(LC)50 64 1 mg/L) to the three aquatic species, were prioritized for further experimental testing. Finally, in the third case-study, classification QSPR models were developed for the prediction of ready biodegradability of fragrance materials. Ready biodegradation is among the basic endpoints required for the assessment of environmental persistence of chemicals. When compared with some existing models commonly used for predicting biodegradation, the here proposed QSPRs showed higher classification accuracy toward fragrance materials. This comparison highlighted the importance of using local models when dealing with specific classes of chemicals. All the proposed QSARs have been developed on the basis of the OECD principles for QSAR acceptability for regulatory purposes, paying particular attention to the external validation procedure and to the statistical definition of the applicability domain of the models. QSAR models based on molecular descriptors generated by both commercial (DRAGON) and freely-available (PaDELDescriptor, QSPR-Thesaurus) software have been proposed. The use of free tool allows for a wider applicability of the here proposed QSAR models. Concluding, the QSAR models developed within this thesis are useful tools to support hazard and risk assessment of specific classes of emerging pollutants, and show how non-testing information can be used for regulatory decisions, thus minimizing costs, time and saving animal lives. Beyond their use for regulatory purposes, the here proposed QSARs can find application in the rational design of new safer compounds that are potentially less hazardous for human health and environment

    In silico strategies to study polypharmacology of G-protein-coupled receptors

    Get PDF
    The development of drugs that simultaneously target multiple receptors in a rational way (i.e., 'magic shotguns') is regarded as a promising approach for drug discovery to treat complex, multi-factorial and multi-pathogenic diseases. My major goal is to develop and employ different computational approaches towards the rational design of drugs with selective polypharmacology towards guanine nucleotide-binding protein (G-protein)-coupled receptors (GPCRs) to treat central nervous system diseases. Our methodologies rely on the advances in chemocentric informatics and chemogenomics to generate experimentally testable hypotheses that are derived by fusing independent lines of evidence. We posit that such hypothesis fusion approach allows us to improve the overall success rates of in silico lead identification efforts. We have developed an integrated computational approach that combines Quantitative Structure-Activity Relationships (QSAR) modeling, model-based virtual screening (VS), gene expression analysis and mining of the biological literature for drug discovery. The dissertation research described herein is focused on: (1) The development of robust data-driven Quantitative Structure-Activity Relationship (QSAR) models of single target GPCR datasets that will amount to the compendium of GPCR predictors: the GPCR QSARome; (2) The development of robust data-driven QSAR models for families of GPCRs and other trans-membrane molecular targets (i.e., sigma receptors) and the application of models as virtual screening tools for the quick prioritization of compounds for biological testing across receptor families; (3) The development of novel integrative chemocentric informatics approaches to predict receptor-mediated clinical effects of chemicals. Results indicated that our computational efforts to establish a compendium of computational predictors and devise an integrative chemocentric informatics approach to study polypharmacology in silico will eventually lead to useful and reliable tools aimed at identifying and enriching chemical libraries with compounds that have the desired activities for more than one molecular target of interest

    Development and application of QSAR models for mechanisms related to endocrine disruption.

    Get PDF
    • …
    corecore