Search CORE

3 research outputs found

UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging

Author: Hajič Jan
Straka Milan
Straková Jana
Publication venue
Publication date: 19/08/2019
Field of study

We present our contribution to the SIGMORPHON 2019 Shared Task: Crosslinguality and Context in Morphology, Task 2: contextual morphological analysis and lemmatization. We submitted a modification of the UDPipe 2.0, one of best-performing systems of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies and an overall winner of the The 2018 Shared Task on Extrinsic Parser Evaluation. As our first improvement, we use the pretrained contextualized embeddings (BERT) as additional inputs to the network; secondly, we use individual morphological features as regularization; and finally, we merge the selected corpora of the same language. In the lemmatization task, our system exceeds all the submitted systems by a wide margin with lemmatization accuracy 95.78 (second best was 95.00, third 94.46). In the morphological analysis, our system placed tightly second: our morphological analysis accuracy was 93.19, the winning system's 93.23.Comment: Accepted by SIGMORPHON 2019: 16th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morpholog

arXiv.org e-Print Archive

UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings

Author: Straka Milan
Straková Jana
Publication venue
Publication date: 05/06/2020
Field of study

We present our contribution to the EvaLatin shared task, which is the first evaluation campaign devoted to the evaluation of NLP tools for Latin. We submitted a system based on UDPipe 2.0, one of the winners of the CoNLL 2018 Shared Task, The 2018 Shared Task on Extrinsic Parser Evaluation and SIGMORPHON 2019 Shared Task. Our system places first by a wide margin both in lemmatization and POS tagging in the open modality, where additional supervised data is allowed, in which case we utilize all Universal Dependency Latin treebanks. In the closed modality, where only the EvaLatin training data is allowed, our system achieves the best performance in lemmatization and in classical subtask of POS tagging, while reaching second place in cross-genre and cross-time settings. In the ablation experiments, we also evaluate the influence of BERT and XLM-RoBERTa contextualized embeddings, and the treebank encodings of the different flavors of Latin treebanks.Comment: Accepted at EvaLatin 2020, LREC (Proceedings of Language Resources and Evaluation, Marseille, France

arXiv.org e-Print Archive

\'UFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task

Author: Straka Milan
Straková Jana
Publication venue
Publication date: 24/10/2019
Field of study

We present a system description of our contribution to the CoNLL 2019 shared task, Cross-Framework Meaning Representation Parsing (MRP 2019). The proposed architecture is our first attempt towards a semantic parsing extension of the UDPipe 2.0, a lemmatization, POS tagging and dependency parsing pipeline. For the MRP 2019, which features five formally and linguistically different approaches to meaning representation (DM, PSD, EDS, UCCA and AMR), we propose a uniform, language and framework agnostic graph-to-graph neural network architecture. Without any knowledge about the graph structure, and specifically without any linguistically or framework motivated features, our system implicitly models the meaning representation graphs. After fixing a human error (we used earlier incorrect version of provided test set analyses), our submission would score third in the competition evaluation. The source code of our system is available at https://github.com/ufal/mrpipe-conll2019

arXiv.org e-Print Archive