3 research outputs found
UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging
We present our contribution to the SIGMORPHON 2019 Shared Task:
Crosslinguality and Context in Morphology, Task 2: contextual morphological
analysis and lemmatization.
We submitted a modification of the UDPipe 2.0, one of best-performing systems
of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal
Dependencies and an overall winner of the The 2018 Shared Task on Extrinsic
Parser Evaluation.
As our first improvement, we use the pretrained contextualized embeddings
(BERT) as additional inputs to the network; secondly, we use individual
morphological features as regularization; and finally, we merge the selected
corpora of the same language.
In the lemmatization task, our system exceeds all the submitted systems by a
wide margin with lemmatization accuracy 95.78 (second best was 95.00, third
94.46). In the morphological analysis, our system placed tightly second: our
morphological analysis accuracy was 93.19, the winning system's 93.23.Comment: Accepted by SIGMORPHON 2019: 16th SIGMORPHON Workshop on
Computational Research in Phonetics, Phonology, and Morpholog
UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings
We present our contribution to the EvaLatin shared task, which is the first
evaluation campaign devoted to the evaluation of NLP tools for Latin. We
submitted a system based on UDPipe 2.0, one of the winners of the CoNLL 2018
Shared Task, The 2018 Shared Task on Extrinsic Parser Evaluation and SIGMORPHON
2019 Shared Task. Our system places first by a wide margin both in
lemmatization and POS tagging in the open modality, where additional supervised
data is allowed, in which case we utilize all Universal Dependency Latin
treebanks. In the closed modality, where only the EvaLatin training data is
allowed, our system achieves the best performance in lemmatization and in
classical subtask of POS tagging, while reaching second place in cross-genre
and cross-time settings. In the ablation experiments, we also evaluate the
influence of BERT and XLM-RoBERTa contextualized embeddings, and the treebank
encodings of the different flavors of Latin treebanks.Comment: Accepted at EvaLatin 2020, LREC (Proceedings of Language Resources
and Evaluation, Marseille, France
\'UFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task
We present a system description of our contribution to the CoNLL 2019 shared
task, Cross-Framework Meaning Representation Parsing (MRP 2019). The proposed
architecture is our first attempt towards a semantic parsing extension of the
UDPipe 2.0, a lemmatization, POS tagging and dependency parsing pipeline.
For the MRP 2019, which features five formally and linguistically different
approaches to meaning representation (DM, PSD, EDS, UCCA and AMR), we propose a
uniform, language and framework agnostic graph-to-graph neural network
architecture. Without any knowledge about the graph structure, and specifically
without any linguistically or framework motivated features, our system
implicitly models the meaning representation graphs.
After fixing a human error (we used earlier incorrect version of provided
test set analyses), our submission would score third in the competition
evaluation. The source code of our system is available at
https://github.com/ufal/mrpipe-conll2019