Search CORE

277 research outputs found

A non-projective greedy dependency parser with bidirectional LSTMs

Author: Gómez-Rodríguez Carlos
Vilares David
Publication venue
Publication date: 01/01/2017
Field of study

The LyS-FASTPARSE team presents BIST-COVINGTON, a neural implementation of the Covington (2001) algorithm for non-projective dependency parsing. The bidirectional LSTM approach by Kipperwasser and Goldberg (2016) is used to train a greedy parser with a dynamic oracle to mitigate error propagation. The model participated in the CoNLL 2017 UD Shared Task. In spite of not using any ensemble methods and using the baseline segmentation and PoS tagging, the parser obtained good results on both macro-average LAS and UAS in the big treebanks category (55 languages), ranking 7th out of 33 teams. In the all treebanks category (LAS and UAS) we ranked 16th and 12th. The gap between the all and big categories is mainly due to the poor performance on four parallel PUD treebanks, suggesting that some `suffixed' treebanks (e.g. Spanish-AnCora) perform poorly on cross-treebank settings, which does not occur with the corresponding `unsuffixed' treebank (e.g. Spanish). By changing that, we obtain the 11th best LAS among all runs (official and unofficial). The code is made available at https://github.com/CoNLL-UD-2017/LyS-FASTPARSEComment: 12 pages, 2 figures, 5 table

arXiv.org e-Print Archive

Crossref

CoNLL 2017 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies

Author: Attia Mohammed
Badmaeva Elena
Banerjee Esha
Burchardt Aljoscha
Cinková Silvie
de Marneffe Marie-Catherine
dePaiva Valeria
Droganova Kira
Elkahky Ali
Fernández Alcalde Héctor
Ginter Filip
Gökırmak Memduh
Habash Nizar
Hajič Jan
Hajič jr., Jan
Harris Kim
Hlaváčová Jaroslava
Kanayama Hiroshi
Kanerva Jenna
Kayadelen Tolga
Kettnerová Václava
Kirchner Jesse
Kwak Sookyoung
Lando Tatiana
Lertpradit Saran
Leung Herman
Li Josie
Luotolahti Juhani
Macketanz Vivien
Mandl Michael
Manning Christopher D.
Manurung Ruli
Marheinecke Katrin
Martínez Alonso Héctor
Mendonça Gustavo
Missilä Anna
Nedoluzhko Anna
Nitisaroj Rattima
Nivre Joakim
Ojala Stina
Petrov Slav
Pitler Emily
Popel Martin
Potthast Martin
Pyysalo Sampo
Reddy Siva
Rehm Georg
Sanguinetti Manuela
Schuster Sebastian
Shimada Atsuko
Simi Maria
Stella Antonio
Straka Milan
Strnadová Jana
Sulubacak Umut
Taji Dima
Tyers Francis
Urešová Zdeňka
Uszkoreit Hans
Yu Zhuoran
Zeman Daniel
Çöltekin Çağrı
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2017
Field of study

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.Peer reviewe

Crossref

Archivio della Ricerca - Università di Pisa

Biblio at Institute of Formal and Applied Linguistics

Helsingin yliopiston digitaalinen arkisto

Institutional Research Information System University of Turin

UD Annotatrix: An Annotation Tool For Universal Dependencies

Author: Sheyanova M.
Tyers F. M.
Washington Jonathan North
Publication venue: 'Transformative Works and Cultures'
Publication date: 01/01/2017
Field of study

In this paper we introduce the UD Annotatrix annotation tool for manual annotation of Universal Dependencies. This tool has been designed with the aim that it should be tailored to the needs of the Universal Dependencies (UD) community, including that it should operate in fully-offline mode, and is freely-available under the GNU GPL licence. In this paper, we provide some background to the tool, an overview of its development, and background on how it works. We compare it with some other widely-used tools which are used for Universal Dependencies annotation, describe some features unique to UD Annotatrix, and finally outline some avenues for future work and provide a few concluding remarks

Works

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Author: Attia M
Badmaeva E
Banerjee E
Burchardt A
Cinková S
Droganova K
Elkahky A
Fernandez Alcalde H
Ginter F
Gökırmak M
Habash N
Hajič J
Hajič jr. J
Harris K
Hlaváčová J
Kanayama H
Kanerva J
Kayadelen T
Kettnerová V
Kirchner J
Kwak S
Lando T
Lertpradit S
Leung H
Li J
Luotolahti J
Macketanz V
Mandl M
Manning C
Manurung R
Marheinecke K
Marneffe M
Martínez Alonso H
Mendonçca G
Missilä A
Nedoluzhko A
Nitisaroj R
Nivre J
Ojala S
Paiva V
Petrov S
Pitler E
Popel M
Potthast M
Pyysalo S
Reddy S
Rehm G
Sanguinetti M
Schuster S
Shimada A
Simi M
Stella A
Straka M
Strnadova J
Taji D
Tyers F
Urešová Z
Uszkoreit H
Yu Z
Zeman D
Publication venue: Vancouver, Canada
Publication date: 28/10/2022
Field of study

UTUPub

An improved neural network model for joint POS tagging and dependency parsing

Author: Nguyen Dat Quoc
Verspoor Karin
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

We propose a novel neural network model for joint part-of-speech (POS) tagging and dependency parsing. Our model extends the well-known BIST graph-based dependency parser (Kiperwasser and Goldberg, 2016) by incorporating a BiLSTM-based tagging component to produce automatically predicted POS tags for the parser. On the benchmark English Penn treebank, our model obtains strong UAS and LAS scores at 94.51% and 92.87%, respectively, producing 1.5+% absolute improvements to the BIST graph-based parser, and also obtaining a state-of-the-art POS tagging accuracy at 97.97%. Furthermore, experimental results on parsing 61 "big" Universal Dependencies treebanks from raw texts show that our model outperforms the baseline UDPipe (Straka and Strakov\'a, 2017) with 0.8% higher average POS tagging score and 3.6% higher average LAS score. In addition, with our model, we also obtain state-of-the-art downstream task scores for biomedical event extraction and opinion analysis applications. Our code is available together with all pre-trained models at: https://github.com/datquocnguyen/jPTDPComment: 11 pages; In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, to appea

arXiv.org e-Print Archive

Crossref

Recommended from our members

Cross-Lingual Transfer of Natural Language Processing Systems

Author: Rasooli Mohammad Sadegh
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Accurate natural language processing systems rely heavily on annotated datasets. In the absence of such datasets, transfer methods can help to develop a model by transferring annotations from one or more rich-resource languages to the target language of interest. These methods are generally divided into two approaches: 1) annotation projection from translation data, aka parallel data, using supervised models in rich-resource languages, and 2) direct model transfer from annotated datasets in rich-resource languages. In this thesis, we demonstrate different methods for transfer of dependency parsers and sentiment analysis systems. We propose an annotation projection method that performs well in the scenarios for which a large amount of in-domain parallel data is available. We also propose a method which is a combination of annotation projection and direct transfer that can leverage a minimal amount of information from a small out-of-domain parallel dataset to develop highly accurate transfer models. Furthermore, we propose an unsupervised syntactic reordering model to improve the accuracy of dependency parser transfer for non-European languages. Finally, we conduct a diverse set of experiments for the transfer of sentiment analysis systems in different data settings. A summary of our contributions are as follows: * We develop accurate dependency parsers using parallel text in an annotation projection framework. We make use of the fact that the density of word alignments is a valuable indicator of reliability in annotation projection. * We develop accurate dependency parsers in the absence of a large amount of parallel data. We use the Bible data, which is in orders of magnitude smaller than a conventional parallel dataset, to provide minimal cues for creating cross-lingual word representations. Our model is also capable of boosting the performance of annotation projection with a large amount of parallel data. Our model develops cross-lingual word representations for going beyond the traditional delexicalized direct transfer methods. Moreover, we propose a simple but effective word translation approach that brings in explicit lexical features from the target language in our direct transfer method. * We develop different syntactic reordering models that can change the source treebanks in rich-resource languages, thus preventing learning a wrong model for a non-related language. Our experimental results show substantial improvements over non-European languages. * We develop transfer methods for sentiment analysis in different data availability scenarios. We show that we can leverage cross-lingual word embeddings to create accurate sentiment analysis systems in the absence of annotated data in the target language of interest. We believe that the novelties that we introduce in this thesis indicate the usefulness of transfer methods. This is appealing in practice, especially since we suggest eliminating the requirement for annotating new datasets for low-resource languages which is expensive, if not impossible, to obtain

Columbia University Academic Commons

Make the Best of Cross-lingual Transfer:Evidence from POS Tagging with over 100 Languages

Author: de Vries Wietse
Nissim Malvina
Wieling Martijn
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

Proceedings - University of Groningen