244 research outputs found

    Investigating the effects of controlled language on the reading and comprehension of machine translated texts: A mixed-methods approach

    Get PDF
    This study investigates whether the use of controlled language (CL) improves the readability and comprehension of technical support documentation produced by a statistical machine translation system. Readability is operationalised here as the extent to which a text can be easily read in terms of formal linguistic elements; while comprehensibility is defined as how easily a text’s content can be understood by the reader. A biphasic mixed-methods triangulation approach is taken, in which a number of quantitative and qualitative evaluation methods are combined. These include: eye tracking, automatic evaluation metrics (AEMs), retrospective interviews, human evaluations, memory recall testing, and readability indices. A further aim of the research is to investigate what, if any, correlations exist between the various metrics used, and to explore the cognitive framework of the evaluation process. The research finds that the use of CL input results in significantly higher scores for items recalled by participants, and for several of the eye tracking metrics: fixation count, fixation length, and regressions. However, the findings show slight insignificant increases for readability indices and human evaluations, and slight insignificant decreases for AEMs. Several significant correlations between the above metrics are identified as well as predictors of readability and comprehensibility

    Linguistic Structure in Statistical Machine Translation

    Get PDF
    This thesis investigates the influence of linguistic structure in statistical machine translation. We develop a word reordering model based on syntactic parse trees and address the issues of pronouns and morphological agreement with a source discriminative word lexicon predicting the translation for individual words using structural features. When used in phrase-based machine translation, the models improve the translation for language pairs with different word order and morphological variation

    Syllabification and parameter optimisation in Zulu to English machine translation

    Get PDF
    We present a series of experiments involving the machine translation of Zulu to English using a well-known statistical software system. Due to morphological complexity and relative scarcity of resources, the case of Zulu is challenging. Against a selection of baseline models, we show that a relatively naive approach of dividing Zulu words into syllables leads to a surprising improvement. We further improve on this model through manual configuration changes. Our best model significantly outperforms the baseline models (BLEU measure, at p < 0.001) even when they are optimised to a similar degree, only falling short of the well-known Morfessor morphological analyser that makes use of relatively sophisticated algorithms. These experiments suggest that even a simple optimisation procedure can improve the quality of this approach to a significant degree. This is promising particularly because it improves on a mostly language independent approach — at least within the same language family. Our work also drives the point home that sub-lexical alignment for Zulu is crucial for improved translation quality.Academy of African Languages and Science (AALS

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Performance-oriented dependency parsing

    Get PDF
    In the last decade a lot of dependency parsers have been developed. This book describes the motivation for the development of yet another parser - MDParser. The state of the art is presented and the deficits of the current developments are discussed. The main problem of the current parsers is that the task of dependency parsing is treated independently of what happens before and after it. However, in practice parsing is rarely done for the sake of parsing itself, but rather in order to use the results in a follow-up application. Additionally, current parsers are accuracy-oriented and focus only on the quality of the results, neglecting other important properties, especially efficiency. The evaluation of some NLP technologies is sometimes as difficult as the task itself. For dependency parsing it was long thought not to be the case, however, some recent works show that the current evaluation possibilities are limited. This book proposes a methodology to account for the weaknesses and combine the strengths of the current approaches. Finally, MDParser is evaluated against other state-of-the-art parsers. The results show that it is the fastest parser currently available and it is able to process plain text, which other parsers usually cannot. The results are slightly behind the top accuracies in the field, however, it is demonstrated that it is not decisive for applications

    Performance-oriented dependency parsing

    Get PDF
    In the last decade a lot of dependency parsers have been developed. This book describes the motivation for the development of yet another parser - MDParser. The state of the art is presented and the deficits of the current developments are discussed. The main problem of the current parsers is that the task of dependency parsing is treated independently of what happens before and after it. However, in practice parsing is rarely done for the sake of parsing itself, but rather in order to use the results in a follow-up application. Additionally, current parsers are accuracy-oriented and focus only on the quality of the results, neglecting other important properties, especially efficiency. The evaluation of some NLP technologies is sometimes as difficult as the task itself. For dependency parsing it was long thought not to be the case, however, some recent works show that the current evaluation possibilities are limited. This book proposes a methodology to account for the weaknesses and combine the strengths of the current approaches. Finally, MDParser is evaluated against other state-of-the-art parsers. The results show that it is the fastest parser currently available and it is able to process plain text, which other parsers usually cannot. The results are slightly behind the top accuracies in the field, however, it is demonstrated that it is not decisive for applications

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available
    • 

    corecore