121 research outputs found

    Methods and algorithms for unsupervised learning of morphology

    Get PDF
    This is an accepted manuscript of a chapter published by Springer in Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403 in 2014 available online: https://doi.org/10.1007/978-3-642-54906-9_15 The accepted version of the publication may differ from the final published version.This paper is a survey of methods and algorithms for unsupervised learning of morphology. We provide a description of the methods and algorithms used for morphological segmentation from a computational linguistics point of view. We survey morphological segmentation methods covering methods based on MDL (minimum description length), MLE (maximum likelihood estimation), MAP (maximum a posteriori), parametric and non-parametric Bayesian approaches. A review of the evaluation schemes for unsupervised morphological segmentation is also provided along with a summary of evaluation results on the Morpho Challenge evaluations.Published versio

    CogALex-V Shared Task: HsH-Supervised – supervised similarity learning using entry wise product of context vectors

    Get PDF
    The CogALex-V Shared Task provides two datasets that consists of pairs of words along with a classification of their semantic relation. The dataset for the first task distinguishes only between related and unrelated, while the second data set distinguishes several types of semantic relations. A number of recent papers propose to construct a feature vector that represents a pair of words by applying a pairwise simple operation to all elements of the feature vector. Subsequently, the pairs can be classified by training any classification algorithm on these vectors. In the present paper we apply this method to the provided datasets. We see that the results are not better than from the given simple baseline. We conclude that the results of the investigated method are strongly depended on the type of data to which it is applied

    Assessing the usability of raw machine translation output: A user-centered study using eye tracking

    Get PDF
    This paper reports on the results of a project that aimed to investigate the usability of raw machine translated technical support documentation for a commercial online file storage service. Adopting a user-centred approach, we utilize the ISO/TR 16982 definition of usability - goal completion, satisfaction, effectiveness, and efficiency – and apply eye-tracking measures shown to be reliable indicators of cognitive effort, along with a post-task questionnaire. We investigated these measures for the original user documentation written in English and in four target languages: Spanish, French, German and Japanese, all of which were translated using a freely available online statistical machine translation engine. Using native speakers for each language, we found several significant differences between the source and MT output, a finding that indicates a difference in usability between well-formed content and raw machine translated content. One target language in particular, Japanese, was found to have a considerably lower usability level when compared with the original English

    Linguistic Structure in Statistical Machine Translation

    Get PDF
    This thesis investigates the influence of linguistic structure in statistical machine translation. We develop a word reordering model based on syntactic parse trees and address the issues of pronouns and morphological agreement with a source discriminative word lexicon predicting the translation for individual words using structural features. When used in phrase-based machine translation, the models improve the translation for language pairs with different word order and morphological variation
    corecore