16 research outputs found
Edition 1.1 of the PARSEME Shared Task on automatic identification of verbal multiword expressions
A Pilot Study on Arabic Multi-Genre Corpus Diacritization Annotation
Arabic script writing is typically underspecified
for short vowels and other mark
up, referred to as diacritics. Apart from the
lexical ambiguity found in words, similar
to that exhibited in other languages, the
lack of diacritics in written Arabic script
adds another layer of ambiguity which is
an artifact of the orthography. Diacritization
of written text has a significant impact
on Arabic NLP applications. In this
paper, we present a pilot study on building
a diacritized multi-genre corpus in
Arabic. We annotate a sample of nondiacritized
words extracted from five text
genres. We explore different annotation
strategies: Basic where we present only
the bare undiacritized forms to the annotators,
Intermediate (Basic forms+their POS
tags), and Advanced (automatically diacritized
words). We present the impact of
the annotation strategy on annotation quality.
Moreover, we study different diacritization
schemes in the process
Overview for the First Shared Task on Language Identification in Code-Switched Data
We present an overview of the first shared task on language identification on code-switched data. The shared task in-cluded code-switched data from four lan
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
International audienceThis paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multi-word expressions. We present the annotation methodology, focusing on changes from last year's shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed