309 research outputs found
Overview of the EVALITA 2016 Part of speech on twitter for Italian task
The increasing interest for the extraction of various forms of knowledge from micro-blogs and social media makes crucial the development of resources and tools that can be used for automatically deal with them. PoSTWITA contributes to the advancement of the state-of-the-art for Italian language by: (a) enriching the community with a previously not existing col- lection of data extracted from Twitter and annotated with grammatical categories, to be used as a benchmark for system evaluation; (b) supporting the adaptation of Part of Speech tagging systems to this particular text domain
When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter
We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter
data, in the context of the Evalita 2016 PoSTWITA shared task. We show that
training the tagger on native Twitter data enriched with little amounts of
specifically selected gold data and additional silver-labelled data scraped
from Facebook, yields better results than using large amounts of manually
annotated data from a mix of genres.Comment: Proceedings of the 5th Evaluation Campaign of Natural Language
Processing and Speech Tools for Italian (EVALITA 2016
- …