437 research outputs found

    Synapse at CAp 2017 NER challenge: Fasttext CRF

    Full text link
    We present our system for the CAp 2017 NER challenge which is about named entity recognition on French tweets. Our system leverages unsupervised learning on a larger dataset of French tweets to learn features feeding a CRF model. It was ranked first without using any gazetteer or structured external data, with an F-measure of 58.89\%. To the best of our knowledge, it is the first system to use fasttext embeddings (which include subword representations) and an embedding-based sentence representation for NER

    Towards the ontology-based approach for factual information matching

    Get PDF
    Factual information is information based on facts or relating to facts. The reliability of automatically extracted facts is the main problem of processing factual information. The fact retrieval system remains one of the most effective tools for identifying the information for decision-making. In this work, we explore how can natural language processing methods and problem domain ontology help to check contradictions and mismatches in facts automatically

    An automatic part-of-speech tagger for Middle Low German

    Get PDF
    Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them

    Dutch named entity recognition using ensemble classifiers

    Get PDF
    corecore