Search CORE

16 research outputs found

Edition 1.1 of the PARSEME Shared Task on automatic identification of verbal multiword expressions

Università degli Studi di Napoli L'Orientale: CINECA IRIS

A Pilot Study on Arabic Multi-Genre Corpus Diacritization Annotation

Author: Abdelati Hawwari (5297429)
Houda Bouamor (3885832)
Kemal Oflazer (3888664)
Mahmoud Ghoneim (5297435)
Mona Diab (5297432)
Ossama Obeid (5294366)
Wajdi Zaghouani (5297402)
Publication venue
Publication date: 01/06/2018
Field of study

Arabic script writing is typically underspecified for short vowels and other mark up, referred to as diacritics. Apart from the lexical ambiguity found in words, similar to that exhibited in other languages, the lack of diacritics in written Arabic script adds another layer of ambiguity which is an artifact of the orthography. Diacritization of written text has a significant impact on Arabic NLP applications. In this paper, we present a pilot study on building a diacritized multi-genre corpus in Arabic. We annotate a sample of nondiacritized words extracted from five text genres. We explore different annotation strategies: Basic where we present only the bare undiacritized forms to the annotators, Intermediate (Basic forms+their POS tags), and Advanced (automatically diacritized words). We present the impact of the annotation strategy on annotation quality. Moreover, we study different diacritization schemes in the process

FigShare

Overview for the First Shared Task on Language Identification in Code-Switched Data

Author: Abdelati Hawwari
Alison Chang
Elizabeth Blair
Fahad Alghamdi
Julia Hirschberg
Mahmoud Gohneim
Mona Diab
Pascale Fung
Steven Bethard
Suraj Maharjan
Thamar Solorio
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

We present an overview of the first shared task on language identification on code-switched data. The shared task in-cluded code-switched data from four lan

CiteSeerX

Crossref

Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions

Author: Bhatia Archna
Buljan Maja
Candito Marie
Cordeiro Silvio
Gantar Polona
Giouli Voula
Güngör Tunga
Hawwari Abdelati
Iñurrieta Uxoa
Kovalevskaitė Jolanta
Krek Simon
Lichte Timm
Liebeskind Chaya
Mititelu Verginica,
Monti Johanna
Parra Escartín Carla
Qasemizadeh Behrang
Ramisch Carlos
Ramisch Renata
Savary Agata
Schneider Nathan
Stoyanova Ivelina
Vaidya Ashwini
Vincze Veronika
Walsh Abigail
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 25/08/2018
Field of study

International audienceThis paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multi-word expressions. We present the annotation methodology, focusing on changes from last year's shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed

HAL AMU