2 research outputs found
The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion
In this paper, we describe the findings of the SIGMORPHON 2020 shared task on
unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a
novel task in the field of inflectional morphology. Participants were asked to
submit systems which take raw text and a list of lemmas as input, and output
all inflected forms, i.e., the entire morphological paradigm, of each lemma. In
order to simulate a realistic use case, we first released data for 5
development languages. However, systems were officially evaluated on 9 surprise
languages, which were only revealed a few days before the submission deadline.
We provided a modular baseline system, which is a pipeline of 4 components. 3
teams submitted a total of 7 systems, but, surprisingly, none of the submitted
systems was able to improve over the baseline on average over all 9 test
languages. Only on 3 languages did a submitted system obtain the best results.
This shows that unsupervised morphological paradigm completion is still largely
unsolved. We present an analysis here, so that this shared task will ground
further research on the topic.Comment: SIGMORPHON 202
The IMS-CUBoulder System for the SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion
In this paper, we present the systems of the University of Stuttgart IMS and
the University of Colorado Boulder (IMS-CUBoulder) for SIGMORPHON 2020 Task 2
on unsupervised morphological paradigm completion (Kann et al., 2020). The task
consists of generating the morphological paradigms of a set of lemmas, given
only the lemmas themselves and unlabeled text. Our proposed system is a
modified version of the baseline introduced together with the task. In
particular, we experiment with substituting the inflection generation component
with an LSTM sequence-to-sequence model and an LSTM pointer-generator network.
Our pointer-generator system obtains the best score of all seven submitted
systems on average over all languages, and outperforms the official baseline,
which was best overall, on Bulgarian and Kannada