4,707 research outputs found
Detection of IUPAC and IUPAC-like chemical names
Motivation: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools
Translation of Organic Compound to 2D Graphical Representation using SDT
IUPAC (The International Union of Pure and Applied Chemistry) customary is employed to explain structure and characteristic of chemical compound. This paper describes translation of IUPAC (International Union of Pure and Applied Chemistry) name into Two-dimensional structure of substance that consists of graphical entities. OpenGL graphical language is wont to generate graphical structure of IUPAC name. Chemical names square measure a typical manner of act chemical structure data. Basic graphical entities square measure wont to generate 2nd graphical structure of IUPAC name from Intermediate Graphical Language. computer file is generated on analyzing IUPAC name. computer file is OpenGL graphical functions to show graphical entities. This translation is achieved victimisation Syntax – Directed Translation theme by associating linguistics action. This offers internet 2nd graphical illustration of IUPAC name. This paper proposes a strategy for achieving this translation.
DOI: 10.17762/ijritcc2321-8169.150512
Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining
Background. Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results. We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions. We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider. com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/ chemlist
Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science
Efficient molecular modeling and design are crucial for the discovery and
exploration of novel molecules, and the incorporation of deep learning methods
has revolutionized this field. In particular, large language models (LLMs)
offer a fresh approach to tackle scientific problems from a natural language
processing (NLP) perspective, introducing a research paradigm called scientific
language modeling (SLM). However, two key issues remain: how to quantify the
match between model and data modalities and how to identify the
knowledge-learning preferences of models. To address these challenges, we
propose a multi-modal benchmark, named ChEBI-20-MM, and perform 1263
experiments to assess the model's compatibility with data modalities and
knowledge acquisition. Through the modal transition probability matrix, we
provide insights into the most suitable modalities for tasks. Furthermore, we
introduce a statistically interpretable approach to discover context-specific
knowledge mapping by localized feature filtering. Our pioneering analysis
offers an exploration of the learning mechanism and paves the way for advancing
SLM in molecular science
Terminology for analytical capillary electromigration techniques (IUPAC Recommendations 2003)
This paper presents terms and definitions for capillary electromigration
techniques for separation, qualitative and quantitative analysis and physico-chemical
characterization. Names and descriptions for such techniques (e.g., capillary
electrophoresis and capillary electrochromatography) as well as terms for the phenomenon
of electroosmotic flow are included
Molecular Identification from AFM Images Using the IUPAC Nomenclature and Attribute Multimodal Recurrent Neural Networks
Spectroscopic methods like nuclear magnetic
resonance, mass spectrometry, X-ray diffraction, and UV/visible
spectroscopies applied to molecular ensembles have so far been
the workhorse for molecular identification. Here, we propose a
radically different chemical characterization approach, based on the
ability of noncontact atomic force microscopy with metal tips
functionalized with a CO molecule at the tip apex (referred as HRAFM) to resolve the internal structure of individual molecules. Our
work demonstrates that a stack of constant-height HR-AFM
images carries enough chemical information for a complete
identification (structure and composition) of quasiplanar organic
molecules, and that this information can be retrieved using
machine learning techniques that are able to disentangle the contribution of chemical composition, bond topology, and internal
torsion of the molecule to the HR-AFM contrast. In particular, we exploit multimodal recurrent neural networks (M-RNN) that
combine convolutional neural networks for image analysis and recurrent neural networks to deal with language processing, to
formulate the molecular identification as an imaging captioning problem. The algorithm is trained using a data set which contains
almost 700,000 molecules and 165 million theoretical AFM images to produce as final output the IUPAC name of the imaged
molecule. Our extensive test with theoretical images and a few experimental ones shows the potential of deep learning algorithms in
the automatic identification of molecular compounds by AFM. This achievement supports the development of on-surface synthesis
and overcomes some limitations of spectroscopic methods in traditional solution-based synthesisWe would like to acknowledge
support from the Comunidad de Madrid Industrial Doctorate
programme 2017 under reference number IND2017/IND7793 and from Quasar Science Resources S.L. P.P. and R.P.
acknowledge support from the Spanish Ministry of Science and
Innovation, through project PID2020-115864RB-I00 and the
“María de Maeztu” Programme for Units of Excellence in R&D
(CEX2018-000805-M). C.R.-M. acknowledges financial support by the Ramón y Cajal program of the Spanish Ministry of
Science and Innovation (ref. RYC2021-031176-I). Computer
time provided by the Red Española de Supercomputación
(RES) at the Finisterrae II Supercomputer is also acknowledge
The psychonauts' world of cognitive enhancers
© 2020 Napoletano, Schifano, Corkery, Guirguis, Arillotta, Zangani and Vento. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY - https://creativecommons.org/licenses/by/4.0/). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.Background: There is growing availability of novel psychoactive substances (NPS), including cognitive enhancers (CEs) which can be used in the treatment of certain mental health disorders. Whilst treating cognitive deficit symptoms in neuropsychiatric or neurodegenerative disorders using CEs might have significant benefits for patients, the increasing recreational use of thesesubstances by healthy individuals raises many clinical, medico-legal and ethical issues. Moreover, it has become very challenging for clinicians to keep up-to-date with CEs currently available as comprehen-sive official lists do not exist.Methods: Using a web crawler (NPSfinder®), the present study aimed at assessing psychonaut fora/ platforms to better understand the online situation regarding CEs. We compared NPSfinder® entries with those from the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA), and from the United Nations Office on Drugs and Crime (UNODC) NPS databases, up to spring 2019. Any substance that was iden-tified by NPSfinder® was considered a CE if it was either described as having nootropic abilities by psychonauts or if it was listed among the known CEs by Froestl and colleagues.Results: A total of 142 unique CEs were identified by NPSfinder®. They were divided into 10 categories, including plants/ herbs/products (29%), prescribed drugs (17%), image and performance enhancing drugs (IPEDs) (15%), psychostimulants (15%), miscellaneous (8%), Phenethylamines (6%), GABAergic drugs (5%), cannabimimetic (4%), tryptamines derivatives (0.5%) and piperazine derivatives (0.5%). A total of 105 chemically different substances were uniquely identified by NPSfinder®. Only one CE was uniquely identified by the EMCDDA; no CE was uniquely identified by the UNODC.Conclusions: These results show that NPSfinder® is helpful as part of an Early Warning System, which could update clinicians with the growing numbers and types of nootropics in the increasingly difficult-to-follow internet world. Improving clinicians’ knowledge of NPS could promote more effective prevention and harm reduction measures in clinical settings.Peer reviewe
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Community’s Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by Consellería
de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
- …