48 research outputs found

    Machine Translation for English--Inuktitut with Segmentation, Data Acquisition and Pre-Training

    Get PDF

    Machine Translation for English--Inuktitut with Segmentation, Data Acquisition and Pre-Training

    Get PDF
    Translating to and from low-resource polysynthetic languages present numerous challenges for NMT. We present the results of our systems for the English--Inuktitut language pair for the WMT 2020 translation tasks. We investigated the importance of correct morphological segmentation, whether or not adding data from a related language (Greenlandic) helps, and whether using contextual word embeddings improves translation. While each method showed some promise, the results are mixed

    Machine Translation for English--Inuktitut with Segmentation, Data Acquisition and Pre-Training

    Get PDF

    A Survey and Classification of Methods for (Mostly) Unsupervised Learning

    Get PDF
    Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit. University of Tartu, Tartu, 2007. ISBN 978-9985-4-0513-0 (online) ISBN 978-9985-4-0514-7 (CD-ROM) pp. 292-296

    Learning Morphological Productivity as Meaning-Form Mappings

    Get PDF
    Child language acquisition is famously accurate despite the sparsity of linguistic input. In this paper, we introduce a cognitively motivated method for morphological acquisition with a special focus on verbal inflections. Using UniMorph annotations as an approximation of children’s semantic representation of verbal inflection, we use the Tolerance Principle to explicitly identify the formal processes of segmentation and mutation that productively encode the semantic relations (e.g., past tense) between stems and inflected forms. Using a child-directed corpus of verbal inflection forms, our model acquires the verbal inflection morphemes of Spanish and English as a list of explicit and linguistically interpretable rules of suffixation and stem change corresponding to sets of semantic features
    corecore