309 research outputs found

    Overview of the EVALITA 2016 Part of speech on twitter for Italian task

    Get PDF
    The increasing interest for the extraction of various forms of knowledge from micro-blogs and social media makes crucial the development of resources and tools that can be used for automatically deal with them. PoSTWITA contributes to the advancement of the state-of-the-art for Italian language by: (a) enriching the community with a previously not existing col- lection of data extracted from Twitter and annotated with grammatical categories, to be used as a benchmark for system evaluation; (b) supporting the adaptation of Part of Speech tagging systems to this particular text domain

    When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter

    Get PDF
    We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter data, in the context of the Evalita 2016 PoSTWITA shared task. We show that training the tagger on native Twitter data enriched with little amounts of specifically selected gold data and additional silver-labelled data scraped from Facebook, yields better results than using large amounts of manually annotated data from a mix of genres.Comment: Proceedings of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2016
    corecore