Search CORE

249 research outputs found

Detection of IUPAC and IUPAC-like chemical names

Author: C. Kolarik
C. M. Friedrich
Eller
Guzikowski
J. Fluck
Kolarik
M. Hofmann-Apitius
R. Klinger
Steinbeck
Wishart
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Motivation: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools

Publications at Bielefeld University

NASARI: a novel approach to a Semantically-Aware Representation of items

Author: CAMACHO COLLADOS Jose'
Navigli Roberto
Pilehvar MOHAMMED TAHER
Publication venue
Publication date: 01/01/2015
Field of study

The semantic representation of individual word senses and concepts is of fundamental importance to several applications in Natural Language Processing. To date, concept modeling techniques have in the main based their representation either on lexicographic resources, such as WordNet, or on encyclopedic resources, such as Wikipedia. We propose a vector representation technique that combines the complementary knowledge of both these types of resource. Thanks to its use of explicit semantics combined with a novel cluster-based dimensionality reduction and an effective weighting scheme, our representation attains state-of-the-art performance on multiple datasets in two standard benchmarks: word similarity and sense clustering. We are releasing our vector representations at http://lcl.uniroma1.it/nasari/

CiteSeerX

Archivio della ricerca- Università di Roma La Sapienza

The iDAI.publication: extracting and linking information in the publications of the German Archaeological Institute (DAI)

Author: Mambrini Francesco
Publication venue: 'OpenEdition'
Publication date: 01/01/2018
Field of study

We present the results of our attempt to use NLP tools in order to identify named entities in the publications of the Deutsches Archäologisches Institute (DAI) and link the identified locations to entries in the iDAI.gazetteer. Our case study focuses on articles written in German and published in the journal Chiron between 1971 and 2014. We describe the annotation pipeline that starts from the digitized texts published in the new portal of the DAI. We evaluate the performances of geoparsing and NER and test an approach to improve the accuracy of the latter.Il paper descrive i risultati dell’esperimento di applicazione di strumenti di NLP per annotare le Named Entities nelle pubblicazioni del Deutsches Archäologisches Institute (DAI) e collegare i toponimi identificati alle rispettive voci dell’iDAI.gazetteer. Il nostro studio si concentra sugli articoli in tedesco pubblicati nella rivista Chiron tra il 1974 e il 2014. Descriviamo la pipeline di annotazione impiegata per processare gli articoli disponibili nel nuovo portale per le pubblicazioni del DAI. Discutiamo i risultati della valutazione degli script di geoparsing e NER e, infine, proponiamo un approccio per migliorare l’accuratezza in quest’ultimo task

OpenEdition