249 research outputs found

    Detection of IUPAC and IUPAC-like chemical names

    Get PDF
    Motivation: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools

    NASARI: a novel approach to a Semantically-Aware Representation of items

    Get PDF
    The semantic representation of individual word senses and concepts is of fundamental importance to several applications in Natural Language Processing. To date, concept modeling techniques have in the main based their representation either on lexicographic resources, such as WordNet, or on encyclopedic resources, such as Wikipedia. We propose a vector representation technique that combines the complementary knowledge of both these types of resource. Thanks to its use of explicit semantics combined with a novel cluster-based dimensionality reduction and an effective weighting scheme, our representation attains state-of-the-art performance on multiple datasets in two standard benchmarks: word similarity and sense clustering. We are releasing our vector representations at http://lcl.uniroma1.it/nasari/

    The iDAI.publication: extracting and linking information in the publications of the German Archaeological Institute (DAI)

    Get PDF
    We present the results of our attempt to use NLP tools in order to identify named entities in the publications of the Deutsches ArchƤologisches Institute (DAI) and link the identified locations to entries in the iDAI.gazetteer. Our case study focuses on articles written in German and published in the journal Chiron between 1971 and 2014. We describe the annotation pipeline that starts from the digitized texts published in the new portal of the DAI. We evaluate the performances of geoparsing and NER and test an approach to improve the accuracy of the latter.Il paper descrive i risultati dellā€™esperimento di applicazione di strumenti di NLP per annotare le Named Entities nelle pubblicazioni del Deutsches ArchƤologisches Institute (DAI) e collegare i toponimi identificati alle rispettive voci dellā€™iDAI.gazetteer. Il nostro studio si concentra sugli articoli in tedesco pubblicati nella rivista Chiron tra il 1974 e il 2014. Descriviamo la pipeline di annotazione impiegata per processare gli articoli disponibili nel nuovo portale per le pubblicazioni del DAI. Discutiamo i risultati della valutazione degli script di geoparsing e NER e, infine, proponiamo un approccio per migliorare lā€™accuratezza in questā€™ultimo task
    • ā€¦
    corecore