Location of Repository

Detection of IUPAC and IUPAC-like chemical names

By Roman Klinger, Corinna Kolářik, Juliane Fluck, Martin Hofmann-Apitius and Christoph M. Friedrich


Motivation: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools

Topics: Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto
Publisher: Oxford University Press
OAI identifier:
Provided by: PubMed Central

