4,707 research outputs found

    Detection of IUPAC and IUPAC-like chemical names

    Get PDF
    Motivation: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools

    Translation of Organic Compound to 2D Graphical Representation using SDT

    Get PDF
    IUPAC (The International Union of Pure and Applied Chemistry) customary is employed to explain structure and characteristic of chemical compound. This paper describes translation of IUPAC (International Union of Pure and Applied Chemistry) name into Two-dimensional structure of substance that consists of graphical entities. OpenGL graphical language is wont to generate graphical structure of IUPAC name. Chemical names square measure a typical manner of act chemical structure data. Basic graphical entities square measure wont to generate 2nd graphical structure of IUPAC name from Intermediate Graphical Language. computer file is generated on analyzing IUPAC name. computer file is OpenGL graphical functions to show graphical entities. This translation is achieved victimisation Syntax – Directed Translation theme by associating linguistics action. This offers internet 2nd graphical illustration of IUPAC name. This paper proposes a strategy for achieving this translation. DOI: 10.17762/ijritcc2321-8169.150512

    Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

    Get PDF
    Background. Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results. We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions. We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider. com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/ chemlist

    Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science

    Full text link
    Efficient molecular modeling and design are crucial for the discovery and exploration of novel molecules, and the incorporation of deep learning methods has revolutionized this field. In particular, large language models (LLMs) offer a fresh approach to tackle scientific problems from a natural language processing (NLP) perspective, introducing a research paradigm called scientific language modeling (SLM). However, two key issues remain: how to quantify the match between model and data modalities and how to identify the knowledge-learning preferences of models. To address these challenges, we propose a multi-modal benchmark, named ChEBI-20-MM, and perform 1263 experiments to assess the model's compatibility with data modalities and knowledge acquisition. Through the modal transition probability matrix, we provide insights into the most suitable modalities for tasks. Furthermore, we introduce a statistically interpretable approach to discover context-specific knowledge mapping by localized feature filtering. Our pioneering analysis offers an exploration of the learning mechanism and paves the way for advancing SLM in molecular science

    Terminology for analytical capillary electromigration techniques (IUPAC Recommendations 2003)

    Get PDF
    This paper presents terms and definitions for capillary electromigration techniques for separation, qualitative and quantitative analysis and physico-chemical characterization. Names and descriptions for such techniques (e.g., capillary electrophoresis and capillary electrochromatography) as well as terms for the phenomenon of electroosmotic flow are included

    Molecular Identification from AFM Images Using the IUPAC Nomenclature and Attribute Multimodal Recurrent Neural Networks

    Full text link
    Spectroscopic methods like nuclear magnetic resonance, mass spectrometry, X-ray diffraction, and UV/visible spectroscopies applied to molecular ensembles have so far been the workhorse for molecular identification. Here, we propose a radically different chemical characterization approach, based on the ability of noncontact atomic force microscopy with metal tips functionalized with a CO molecule at the tip apex (referred as HRAFM) to resolve the internal structure of individual molecules. Our work demonstrates that a stack of constant-height HR-AFM images carries enough chemical information for a complete identification (structure and composition) of quasiplanar organic molecules, and that this information can be retrieved using machine learning techniques that are able to disentangle the contribution of chemical composition, bond topology, and internal torsion of the molecule to the HR-AFM contrast. In particular, we exploit multimodal recurrent neural networks (M-RNN) that combine convolutional neural networks for image analysis and recurrent neural networks to deal with language processing, to formulate the molecular identification as an imaging captioning problem. The algorithm is trained using a data set which contains almost 700,000 molecules and 165 million theoretical AFM images to produce as final output the IUPAC name of the imaged molecule. Our extensive test with theoretical images and a few experimental ones shows the potential of deep learning algorithms in the automatic identification of molecular compounds by AFM. This achievement supports the development of on-surface synthesis and overcomes some limitations of spectroscopic methods in traditional solution-based synthesisWe would like to acknowledge support from the Comunidad de Madrid Industrial Doctorate programme 2017 under reference number IND2017/IND7793 and from Quasar Science Resources S.L. P.P. and R.P. acknowledge support from the Spanish Ministry of Science and Innovation, through project PID2020-115864RB-I00 and the “María de Maeztu” Programme for Units of Excellence in R&D (CEX2018-000805-M). C.R.-M. acknowledges financial support by the Ramón y Cajal program of the Spanish Ministry of Science and Innovation (ref. RYC2021-031176-I). Computer time provided by the Red Española de Supercomputación (RES) at the Finisterrae II Supercomputer is also acknowledge

    The psychonauts' world of cognitive enhancers

    Get PDF
    © 2020 Napoletano, Schifano, Corkery, Guirguis, Arillotta, Zangani and Vento. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY - https://creativecommons.org/licenses/by/4.0/). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.Background: There is growing availability of novel psychoactive substances (NPS), including cognitive enhancers (CEs) which can be used in the treatment of certain mental health disorders. Whilst treating cognitive deficit symptoms in neuropsychiatric or neurodegenerative disorders using CEs might have significant benefits for patients, the increasing recreational use of thesesubstances by healthy individuals raises many clinical, medico-legal and ethical issues. Moreover, it has become very challenging for clinicians to keep up-to-date with CEs currently available as comprehen-sive official lists do not exist.Methods: Using a web crawler (NPSfinder®), the present study aimed at assessing psychonaut fora/ platforms to better understand the online situation regarding CEs. We compared NPSfinder® entries with those from the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA), and from the United Nations Office on Drugs and Crime (UNODC) NPS databases, up to spring 2019. Any substance that was iden-tified by NPSfinder® was considered a CE if it was either described as having nootropic abilities by psychonauts or if it was listed among the known CEs by Froestl and colleagues.Results: A total of 142 unique CEs were identified by NPSfinder®. They were divided into 10 categories, including plants/ herbs/products (29%), prescribed drugs (17%), image and performance enhancing drugs (IPEDs) (15%), psychostimulants (15%), miscellaneous (8%), Phenethylamines (6%), GABAergic drugs (5%), cannabimimetic (4%), tryptamines derivatives (0.5%) and piperazine derivatives (0.5%). A total of 105 chemically different substances were uniquely identified by NPSfinder®. Only one CE was uniquely identified by the EMCDDA; no CE was uniquely identified by the UNODC.Conclusions: These results show that NPSfinder® is helpful as part of an Early Warning System, which could update clinicians with the growing numbers and types of nootropics in the increasingly difficult-to-follow internet world. Improving clinicians’ knowledge of NPS could promote more effective prevention and harm reduction measures in clinical settings.Peer reviewe

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio
    corecore