2,046 research outputs found

    Visual and computational analysis of structure-activity relationships in high-throughput screening data

    Get PDF
    Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets

    Cheminformatics Tools to Explore the Chemical Space of Peptides and Natural Products

    Get PDF
    Cheminformatics facilitates the analysis, storage, and collection of large quantities of chemical data, such as molecular structures and molecules' properties and biological activity, and it has revolutionized medicinal chemistry for small molecules. However, its application to larger molecules is still underrepresented. This thesis work attempts to fill this gap and extend the cheminformatics approach towards large molecules and peptides. This thesis is divided into two parts. The first part presents the implementation and application of two new molecular descriptors: macromolecule extended atom pair fingerprint (MXFP) and MinHashed atom pair fingerprint of radius 2 (MAP4). MXFP is an atom pair fingerprint suitable for large molecules, and here, it is used to explore the chemical space of non-Lipinski molecules within the widely used PubChem and ChEMBL databases. MAP4 is a MinHashed hybrid of substructure and atom pair fingerprints suitable for encoding small and large molecules. MAP4 is first benchmarked against commonly used atom pairs and substructure fingerprints, and then it is used to investigate the chemical space of microbial and plants natural products with the aid of machine learning and chemical space mapping. The second part of the thesis focuses on peptides, and it is introduced by a review chapter on approaches to discover novel peptide structures and describing the known peptide chemical space. Then, a genetic algorithm that uses MXFP in its fitness function is described and challenged to generate peptide analogs of peptidic or non-peptidic queries. Finally, supervised and unsupervised machine learning is used to generate novel antimicrobial and non-hemolytic peptide sequences

    Revealing cytotoxic substructures in molecules using deep learning

    Get PDF
    In drug development, late stage toxicity issues of a compound are the main cause of failure in clinical trials. In silico meth ods are therefore of high importance to guide the early design process to reduce time, costs and animal testing. Technical advances and the ever growing amount of available toxicity data enabled machine learning, especially neural networks, to impact the feld of predictive toxicology. In this study, cytotoxicity prediction, one of the earliest handles in drug discovery, is investigated using a deep learning approach trained on a highly consistent in-house data set of over 34,000 compounds with a share of less than 5% of cytotoxic molecules. The model reached a balanced accuracy of over 70%, similar to previ ously reported studies using Random Forest. Albeit yielding good results, neural networks are often described as a black box lacking deeper mechanistic understanding of the underlying model. To overcome this absence of interpretability, a Deep Taylor Decomposition method is investigated to identify substructures that may be responsible for the cytotoxic efects, the so-called toxicophores. Furthermore, this study introduces cytotoxicity maps which provide a visual structural interpretation of the relevance of these substructures. Using this approach could be helpful in drug development to predict the potential toxicity of a compound as well as to generate new insights into the toxic mechanism. Moreover, it could also help to de-risk and optimize compounds

    Computational approaches to virtual screening in human central nervous system therapeutic targets

    Get PDF
    In the past several years of drug design, advanced high-throughput synthetic and analytical chemical technologies are continuously producing a large number of compounds. These large collections of chemical structures have resulted in many public and commercial molecular databases. Thus, the availability of larger data sets provided the opportunity for developing new knowledge mining or virtual screening (VS) methods. Therefore, this research work is motivated by the fact that one of the main interests in the modern drug discovery process is the development of new methods to predict compounds with large therapeutic profiles (multi-targeting activity), which is essential for the discovery of novel drug candidates against complex multifactorial diseases like central nervous system (CNS) disorders. This work aims to advance VS approaches by providing a deeper understanding of the relationship between chemical structure and pharmacological properties and design new fast and robust tools for drug designing against different targets/pathways. To accomplish the defined goals, the first challenge is dealing with big data set of diverse molecular structures to derive a correlation between structures and activity. However, an extendable and a customizable fully automated in-silico Quantitative-Structure Activity Relationship (QSAR) modeling framework was developed in the first phase of this work. QSAR models are computationally fast and powerful tool to screen huge databases of compounds to determine the biological properties of chemical molecules based on their chemical structure. The generated framework reliably implemented a full QSAR modeling pipeline from data preparation to model building and validation. The main distinctive features of the designed framework include a)efficient data curation b) prior estimation of data modelability and, c)an-optimized variable selection methodology that was able to identify the most biologically relevant features responsible for compound activity. Since the underlying principle in QSAR modeling is the assumption that the structures of molecules are mainly responsible for their pharmacological activity, the accuracy of different structural representation approaches to decode molecular structural information largely influence model predictability. However, to find the best approach in QSAR modeling, a comparative analysis of two main categories of molecular representations that included descriptor-based (vector space) and distance-based (metric space) methods was carried out. Results obtained from five QSAR data sets showed that distance-based method was superior to capture the more relevant structural elements for the accurate characterization of molecular properties in highly diverse data sets (remote chemical space regions). This finding further assisted to the development of a novel tool for molecular space visualization to increase the understanding of structure-activity relationships (SAR) in drug discovery projects by exploring the diversity of large heterogeneous chemical data. In the proposed visual approach, four nonlinear DR methods were tested to represent molecules lower dimensionality (2D projected space) on which a non-parametric 2D kernel density estimation (KDE) was applied to map the most likely activity regions (activity surfaces). The analysis of the produced probabilistic surface of molecular activities (PSMAs) from the four datasets showed that these maps have both descriptive and predictive power, thus can be used as a spatial classification model, a tool to perform VS using only structural similarity of molecules. The above QSAR modeling approach was complemented with molecular docking, an approach that predicts the best mode of drug-target interaction. Both approaches were integrated to develop a rational and re-usable polypharmacology-based VS pipeline with improved hits identification rate. For the validation of the developed pipeline, a dual-targeting drug designing model against Parkinson’s disease (PD) was derived to identify novel inhibitors for improving the motor functions of PD patients by enhancing the bioavailability of dopamine and avoiding neurotoxicity. The proposed approach can easily be extended to more complex multi-targeting disease models containing several targets and anti/offtargets to achieve increased efficacy and reduced toxicity in multifactorial diseases like CNS disorders and cancer. This thesis addresses several issues of cheminformatics methods (e.g., molecular structures representation, machine learning, and molecular similarity analysis) to improve and design new computational approaches used in chemical data mining. Moreover, an integrative drug-designing pipeline is designed to improve polypharmacology-based VS approach. This presented methodology can identify the most promising multi-targeting candidates for experimental validation of drug-targets network at the systems biology level in the drug discovery process
    • …
    corecore