Data-Driven Morphological Analysis for Uralic Languages

Abstract

This paper describes an initial set of experiments in data-driven morpholog-ical analysis of Uralic languages. The paper differs from previous work in thatour work covers both lemmatization and generating ambiguous analyses. Whilehand-crafted finite-state transducers represent the state of the art in morpholog-ical analysis for most Uralic languages, we believe that there is a place for data-driven approaches, especially with respect to making up for lack of completenessin the шlexicon. We present results for nine Uralic languages that show that, atleast for basic nominal morphology for six out of the nine languages, data-drivenmethods can achieve an F-score of over 90%, providing results that approach thoseof finite-state techniques. We also compare our system to an earlier approach toFinnish data-driven morphological analysis (Silfverberg and Hulden,2018) andshow that our system outperforms this baseline.Peer reviewe

    Similar works