97 research outputs found

    The development of a knowledge base for basic active structures: an example case of dopamine agonists

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chemical compounds affecting a bioactivity can usually be classified into several groups, each of which shares a characteristic substructure. We call these substructures "basic active structures" or BASs. The extraction of BASs is challenging when the database of compounds contains a variety of skeletons. Data mining technology, associated with the work of chemists, has enabled the systematic elaboration of BASs.</p> <p>Results</p> <p>This paper presents a BAS knowledge base, BASiC, which currently covers 46 activities and is available on the Internet. We use the dopamine agonists D1, D2, and Dauto as examples and illustrate the process of BAS extraction. The resulting BASs were reasonably interpreted after proposing a few template structures.</p> <p>Conclusions</p> <p>The knowledge base is useful for drug design. Proposed BASs and their supporting structures in the knowledge base will facilitate the development of new template structures for other activities, and will be useful in the design of new lead compounds via reasonable interpretations of active structures.</p

    Human-machine scientific discovery

    Get PDF
    International audienceHumanity is facing existential, societal challenges related to food security, ecosystem conservation, antimicrobial resistance, etc, and Artificial Intelligence (AI) is already playing an important role in tackling these new challenges. Most current AI approaches are limited when it comes to ‘knowledge transfer’ with humans, i.e. it is difficult to incorporate existing human knowledge and also the output knowledge is not human comprehensible. In this chapter we demonstrate how a combination of comprehensible machine learning, text-mining and domain knowledge could enhance human-machine collaboration for the purpose of automated scientific discovery where humans and computers jointly develop and evaluate scientific theories. As a case study, we describe a combination of logic-based machine learning (which included human-encoded ecological background knowledge) and text-mining from scientific publications (to verify machine-learned hypotheses) for the purpose of automated discovery of ecological interaction networks (food-webs) to detect change in agricultural ecosystems using the Farm Scale Evaluations (FSEs) of genetically modified herbicide-tolerant (GMHT) crops dataset. The results included novel food-web hypotheses, some confirmed by subsequent experimental studies (e.g. DNA analysis) and published in scientific journals. These machine-leaned food-webs were also used as the basis of a recent study revealing resilience of agro-ecosystems to changes in farming management using GMHT crops

    In Silico-Guided Design of Novel-Scaffold Therapeutics Targeting the Dopamine D3 Receptor

    Get PDF
    Computational methods in drug discovery reduce research time and costs, and only now can be applied to certain psychiatric conditions due to recent breakthroughs in determining the 3D structures of relevant drug receptors in the brain. A new computational technique, de novo fragment-based drug design (DFDD), was evaluated employing a dopamine D3 receptor (D3R) crystal structure. Three DFDD approaches - scaffold replacement, ligand building, and MedChem Transformations - were assessed in replacing structural portions of eticlopride, a D2/D3R-specific antagonist, to generate compounds of novel drug scaffold. Pharmacological characterization of the compounds determined their binding affinities at target brain receptors. Analogs of scaffold replacement-generated compounds displayed moderate D3R affinity, suggesting that this DFDD method could be an important drug design tool. The findings support the addition of in silico approaches to conventional drug discovery, toward creation of new therapeutics for depression, anxiety, schizophrenia, addiction and other disorders of the central nervous system

    A novel hybrid ultrafast shape descriptor method for use in virtual screening.

    Get PDF
    BACKGROUND: We have introduced a new Hybrid descriptor composed of the MACCS key descriptor encoding topological information and Ballester and Richards' Ultrafast Shape Recognition (USR) descriptor. The latter one is calculated from the moments of the distribution of the interatomic distances, and in this work we also included higher moments than in the original implementation. RESULTS: The performance of this Hybrid descriptor is assessed using Random Forest and a dataset of 116,476 molecules. Our dataset includes 5,245 molecules in ten classes from the 2005 World Anti-Doping Agency (WADA) dataset and 111,231 molecules from the National Cancer Institute (NCI) database. In a 10-fold Monte Carlo cross-validation this dataset was partitioned into three distinct parts for training, optimisation of an internal threshold that we introduced, and validation of the resulting model. The standard errors obtained were used to assess statistical significance of observed improvements in performance of our new descriptor. CONCLUSION: The Hybrid descriptor was compared to the MACCS key descriptor, USR with the first three (USR), four (UF4) and five (UF5) moments, and a combination of MACCS with USR (three moments). The MACCS key descriptor was not combined with UF5, due to similar performance of UF5 and UF4. Superior performance in terms of all figures of merit was found for the MACCS/UF4 Hybrid descriptor with respect to all other descriptors examined. These figures of merit include recall in the top 1% and top 5% of the ranked validation sets, precision, F-measure, area under the Receiver Operating Characteristic curve and Matthews Correlation Coefficient

    Formal Concept Analysis for the Interpretation of Relational Learning applied on 3D Protein-Binding Sites

    Get PDF
    International audienceInductive Logic Programming (ILP) is a powerful learning method which allows an expressive representation of the data and produces explicit knowledge. However, ILP systems suffer from a major drawback as they return a single theory based on heuristic user-choices of various parameters, thus ignoring potentially relevant rules. Accordingly, we propose an original approach based on Formal Concept Analysis for effective interpretation of reached theories with the possibility of adding domain knowledge. Our approach is applied to the characterization of three-dimensional (3D) protein-binding sites which are the protein portions on which interactions with other proteins take place. In this context, we define a relational and logical representation of 3D patches and formalize the problem as a concept learning problem using ILP. We report here the results we obtained on a particular category of protein-binding sites namely phosphorylation sites using ILP followed by FCA-based interpretation

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Open Babel: An open chemical toolbox

    Get PDF
    Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendorneutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license fro
    • 

    corecore