Search CORE

1,195 research outputs found

Transfer Learning for Low-Resource Part-of-Speech Tagging

Author: Verma Neha
Zhou Jeffrey
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 21/08/2021
Field of study

Neural network approaches to Part-of-Speech tagging, like other supervised neural network tasks, benefit from larger quantities of labeled data. However, in the case of low-resource languages, additional methods are necessary to improve the performances of POS taggers. In this paper, we explore transfer learning approaches to improve POS tagging in Afrikaans using a neural network. We investigate the effect of transferring network weights that were originally trained for POS tagging in Dutch. We also test the use of pretrained word embeddings in our POS tagger, both independently and in conjunction with the transferred weights from a Dutch POS tagger. We find a marginal increase in performance due to transfer learning with the Dutch POS tagger, and a significant increase due to the use of either unaligned or aligned pretrained embeddings. Notably, there is little difference in performance when using either unaligned or aligned embeddings, even when utilizing cross-lingual transfer learning

Yale University

Wh-copying, phases, and successive cyclicity

Author: Felser Claudia
Publication venue: Essex Research Reports in Linguistics
Publication date: 01/01/2001
Field of study

University of Essex Research Repository

Marrying Universal Dependencies and Universal Morphology

Author: Cotterell Ryan
Hulden Mans
McCarthy Arya D.
Silfverberg Miikka
Yarowsky David
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language. Each project also provides corpora of annotated text in many languages - UD at the token level and UniMorph at the type level. As each corpus is built by different annotators, language-specific decisions hinder the goal of universal schemata. With compatibility of tags, each project's annotations could be used to validate the other's. Additionally, the availability of both type- and token-level resources would be a boon to tasks such as parsing and homograph disambiguation. To ease this interoperability, we present a deterministic mapping from Universal Dependencies v2 features into the UniMorph schema. We validate our approach by lookup in the UniMorph corpora and find a macro-average of 64.13% recall. We also note incompatibilities due to paucity of data on either side. Finally, we present a critical evaluation of the foundations, strengths, and weaknesses of the two annotation projects.Comment: UDW1

arXiv.org e-Print Archive

Crossref

An improved neural network model for joint POS tagging and dependency parsing

Author: Nguyen Dat Quoc
Verspoor Karin
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

We propose a novel neural network model for joint part-of-speech (POS) tagging and dependency parsing. Our model extends the well-known BIST graph-based dependency parser (Kiperwasser and Goldberg, 2016) by incorporating a BiLSTM-based tagging component to produce automatically predicted POS tags for the parser. On the benchmark English Penn treebank, our model obtains strong UAS and LAS scores at 94.51% and 92.87%, respectively, producing 1.5+% absolute improvements to the BIST graph-based parser, and also obtaining a state-of-the-art POS tagging accuracy at 97.97%. Furthermore, experimental results on parsing 61 "big" Universal Dependencies treebanks from raw texts show that our model outperforms the baseline UDPipe (Straka and Strakov\'a, 2017) with 0.8% higher average POS tagging score and 3.6% higher average LAS score. In addition, with our model, we also obtain state-of-the-art downstream task scores for biomedical event extraction and opinion analysis applications. Our code is available together with all pre-trained models at: https://github.com/datquocnguyen/jPTDPComment: 11 pages; In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, to appea

arXiv.org e-Print Archive

Crossref