Search CORE

18 research outputs found

UD Annotatrix: An Annotation Tool For Universal Dependencies

Author: Sheyanova M.
Tyers F. M.
Washington Jonathan North
Publication venue: 'Transformative Works and Cultures'
Publication date: 01/01/2017
Field of study

In this paper we introduce the UD Annotatrix annotation tool for manual annotation of Universal Dependencies. This tool has been designed with the aim that it should be tailored to the needs of the Universal Dependencies (UD) community, including that it should operate in fully-offline mode, and is freely-available under the GNU GPL licence. In this paper, we provide some background to the tool, an overview of its development, and background on how it works. We compare it with some other widely-used tools which are used for Universal Dependencies annotation, describe some features unique to UD Annotatrix, and finally outline some avenues for future work and provide a few concluding remarks

Works

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

Author
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

CoNLL 2017 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies

Author: Attia Mohammed
Badmaeva Elena
Banerjee Esha
Burchardt Aljoscha
Cinková Silvie
de Marneffe Marie-Catherine
dePaiva Valeria
Droganova Kira
Elkahky Ali
Fernández Alcalde Héctor
Ginter Filip
Gökırmak Memduh
Habash Nizar
Hajič Jan
Hajič jr., Jan
Harris Kim
Hlaváčová Jaroslava
Kanayama Hiroshi
Kanerva Jenna
Kayadelen Tolga
Kettnerová Václava
Kirchner Jesse
Kwak Sookyoung
Lando Tatiana
Lertpradit Saran
Leung Herman
Li Josie
Luotolahti Juhani
Macketanz Vivien
Mandl Michael
Manning Christopher D.
Manurung Ruli
Marheinecke Katrin
Martínez Alonso Héctor
Mendonça Gustavo
Missilä Anna
Nedoluzhko Anna
Nitisaroj Rattima
Nivre Joakim
Ojala Stina
Petrov Slav
Pitler Emily
Popel Martin
Potthast Martin
Pyysalo Sampo
Reddy Siva
Rehm Georg
Sanguinetti Manuela
Schuster Sebastian
Shimada Atsuko
Simi Maria
Stella Antonio
Straka Milan
Strnadová Jana
Sulubacak Umut
Taji Dima
Tyers Francis
Urešová Zdeňka
Uszkoreit Hans
Yu Zhuoran
Zeman Daniel
Çöltekin Çağrı
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2017
Field of study

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.Peer reviewe

Crossref

Archivio della Ricerca - Università di Pisa

Biblio at Institute of Formal and Applied Linguistics

Helsingin yliopiston digitaalinen arkisto

Institutional Research Information System University of Turin

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Author: Attia M
Badmaeva E
Banerjee E
Burchardt A
Cinková S
Droganova K
Elkahky A
Fernandez Alcalde H
Ginter F
Gökırmak M
Habash N
Hajič J
Hajič jr. J
Harris K
Hlaváčová J
Kanayama H
Kanerva J
Kayadelen T
Kettnerová V
Kirchner J
Kwak S
Lando T
Lertpradit S
Leung H
Li J
Luotolahti J
Macketanz V
Mandl M
Manning C
Manurung R
Marheinecke K
Marneffe M
Martínez Alonso H
Mendonçca G
Missilä A
Nedoluzhko A
Nitisaroj R
Nivre J
Ojala S
Paiva V
Petrov S
Pitler E
Popel M
Potthast M
Pyysalo S
Reddy S
Rehm G
Sanguinetti M
Schuster S
Shimada A
Simi M
Stella A
Straka M
Strnadova J
Taji D
Tyers F
Urešová Z
Uszkoreit H
Yu Z
Zeman D
Publication venue: Vancouver, Canada
Publication date: 28/10/2022
Field of study

UTUPub

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

Author: Cotterell Ryan
Heinz Jeffrey
Hulden Mans
Kirov Christo
Malaviya Chaitanya
McCarthy Arya D.
Mielke Sabrina J.
Nicolai Garrett
Silfverberg Miikka
Vylomova Ekaterina
Wolf-Sonkin Lawrence
Wu Shijie
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years' inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low-resource language. This year also presents a new second challenge on lemmatization and morphological feature analysis in context. All submissions featured a neural component and built on either this year's strong baselines or highly ranked systems from previous years' shared tasks. Every participating team improved in accuracy over the baselines for the inflection task (though not Levenshtein distance), and every team in the contextual analysis task improved on both state-of-the-art neural and non-neural baselines.Comment: Presented at SIGMORPHON 201

arXiv.org e-Print Archive

Crossref

When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models

Author: Anastasopoulos Antonios
Muller Benjamin
Sagot Benoît
Seddah Djamé
Publication venue: HAL CCSD
Publication date: 06/06/2021
Field of study

International audienceTransfer learning based on pretraining language models on a large amount of raw data has become a new norm to reach state-of-theart performance in NLP. Still, it remains unclear how this approach should be applied for unseen languages that are not covered by any available large-scale multilingual language model and for which only a small amount of raw data is generally available. In this work, by comparing multilingual and monolingual models, we show that such models behave in multiple ways on unseen languages. Some languages greatly benefit from transfer learning and behave similarly to closely related high resource languages whereas others apparently do not. Focusing on the latter, we show that this failure to transfer is largely related to the impact of the script used to write such languages. We show that transliterating those languages significantly improves the potential of large-scale multilingual language models on downstream tasks. This result provides a promising direction towards making these massively multilingual models useful for a new set of unseen languages

INRIA a CCSD electronic archive server

Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP

Author: Plank Barbara
Ramponi Alan
Sharaf Ibrahim
van der Goot Rob
Üstün Ahmet
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

Transfer learning, particularly approaches that combine multi-task learning with pre-trained contextualized embeddings and fine-tuning, have advanced the field of Natural Language Processing tremendously in recent years. In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings. The benefits of MaChAmp are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit, from text classification and sequence labeling to dependency parsing, masked language modeling, and text generation.Comment: https://machamp-nlp.github.io

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

The IT University of Copenhagen's Repository

SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

Author: Aiton Grant
Ambridge Ben
Ataman Duygu
Ate Yustinus Ghanggo
Barta Botond
Bayyr-ool Aziyana
Bernardy Jean-Philippe
Chodroff Eleanor
Coler Matt
Cotterell Ryan
Ek Adam
El-Khaissi Charbel
Ganieva Sofya
Gasser Michael
Goldman Omer
Habash Nizar
Hatcher Richard J.
Hulden Mans
Ivanova Sardana
Khalifa Salam
Kieraś Witold
Klyachko Elena
Krizhanovsky Andrew
Krizhanovsky Natalia
Kumar Ritesh
Lakatos Dorina
Lane William
Leonard Brian
Liu Zoey
Mielke Sabrina J.
Montoya Samame Jaime Rafael
Nicolai Garett
Nuriah Zahroh
Oncevay Arturo
Pimentel Tiago
Plugaryov Matvey
Ponti Edoardo M.
Prud'hommeaux Emily
Raj Mohit
Ratan Shyam
Ryskina Maria
Salchak Aelita
Salehi Ali
Shcherbakov Andrey
Sheifer Karina
Silva Villegas Gema Celeste
Stoehr Niklas
Straughn Christopher
Suhardijanto Totok
Szolnok Gábor
Tyers Francis M.
Vania Clara
Vylomova Ekaterina
Washington Jonathan
Woliński Marcin
Wu Shijie
Yarowsky David
Ács Judit
Publication venue: The Association for Computational Linguistics
Publication date: 01/08/2021
Field of study

This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.Peer reviewe

Edinburgh Research Explorer

Helsingin yliopiston digitaalinen arkisto