Search CORE

2 research outputs found

Yet Another Format of Universal Dependencies for Korean

Author: Chen Yige
Jo Eunkyul Leah
Lim KyungTae
Park Jungyeul
Silfverberg Miikka
Tyers Francis M.
Yao Yundong
Publication venue
Publication date: 20/09/2022
Field of study

In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automatically. The effectiveness of the proposed format for Korean dependency parsing is then testified by both statistical and neural models, including UDPipe and Stanza, with our carefully constructed morpheme-based word embedding for Korean. morphUD outperforms parsing results for all Korean UD treebanks, and we also present detailed error analyses.Comment: COLING2022, Poste

arXiv.org e-Print Archive

K-UniMorph: Korean Universal Morphology and its Feature Schema

Author: Jo Eunkyul Leah
Kim Kyuwon
Lim KyungTae
Park Chulwoo
Park Jungyeul
Wu Xihan
Publication venue
Publication date: 10/05/2023
Field of study

We present in this work a new Universal Morphology dataset for Korean. Previously, the Korean language has been underrepresented in the field of morphological paradigms amongst hundreds of diverse world languages. Hence, we propose this Universal Morphological paradigms for the Korean language that preserve its distinct characteristics. For our K-UniMorph dataset, we outline each grammatical criterion in detail for the verbal endings, clarify how to extract inflected forms, and demonstrate how we generate the morphological schemata. This dataset adopts morphological feature schema from Sylak-Glassman et al. (2015) and Sylak-Glassman (2016) for the Korean language as we extract inflected verb forms from the Sejong morphologically analyzed corpus that is one of the largest annotated corpora for Korean. During the data creation, our methodology also includes investigating the correctness of the conversion from the Sejong corpus. Furthermore, we carry out the inflection task using three different Korean word forms: letters, syllables and morphemes. Finally, we discuss and describe future perspectives on Korean morphological paradigms and the dataset.Comment: Findings of the Association for Computational Linguistics: ACL 202

arXiv.org e-Print Archive