78,246 research outputs found

    Machine translation evaluation resources and methods: a survey

    Get PDF
    We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content

    Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts

    Get PDF
    There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between the various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies

    Learning Parse and Translation Decisions From Examples With Rich Context

    Full text link
    We present a knowledge and context-based system for parsing and translating natural language and evaluate it on sentences from the Wall Street Journal. Applying machine learning techniques, the system uses parse action examples acquired under supervision to generate a deterministic shift-reduce parser in the form of a decision structure. It relies heavily on context, as encoded in features which describe the morphological, syntactic, semantic and other aspects of a given parse state.Comment: 8 pages, LaTeX, 3 postscript figures, uses aclap.st

    The role of syntax and semantics in machine translation and quality estimation of machine-translated user-generated content

    Get PDF
    The availability of the Internet has led to a steady increase in the volume of online user-generated content, the majority of which is in English. Machine-translating this content to other languages can help disseminate the information contained in it to a broader audience. However, reliably publishing these translations requires a prior estimate of their quality. This thesis is concerned with the statistical machine translation of Symantec's Norton forum content, focusing in particular on its quality estimation (QE) using syntactic and semantic information. We compare the output of phrase-based and syntax-based English-to-French and English-to-German machine translation (MT) systems automatically and manually, and nd that the syntax-based methods do not necessarily handle grammar-related phenomena in translation better than the phrase-based methods. Although these systems generate suciently dierent outputs, the apparent lack of a systematic dierence between these outputs impedes its utilisation in a combination framework. To investigate the role of syntax and semantics in quality estimation of machine translation, we create SymForum, a data set containing French machine translations of English sentences from Norton forum content, their post-edits and their adequacy and uency scores. We use syntax in quality estimation via tree kernels, hand-crafted features and their combination, and nd it useful both alone and in combination with surface-driven features. Our analyses show that neither the accuracy of the syntactic parses used by these systems nor the parsing quality of the MT output aect QE performance. We also nd that adding more structure to French Treebank parse trees can be useful for syntax-based QE. We use semantic role labelling (SRL) for our semantic-based QE experiments. We experiment with the limited resources that are available for French and nd that a small manually annotated training set is substantially more useful than a much larger articially created set. We use SRL in quality estimation using tree kernels, hand-crafted features and their combination. Additionally, we introduce PAM, a QE metric based on the predicate-argument structure match between source and target. We nd that the SRL quality, especially on the target side, is the major factor negatively aecting the performance of the semantic-based QE. Finally, we annotate English and French Norton forum sentences with their phrase structure syntax using an annotation strategy adapted for user-generated text. We nd that user errors occur in only a small fraction of the data, but their correction does improve parsing performance. These treebanks (Foreebank) prove to be useful as supplementary training data in adapting the parsers to the forum text. The improved parses ultimately increase the performance of the semantic-based QE. However, a reliable semantic-based QE system requires further improvements in the quality of the underlying semantic role labelling

    On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference

    Full text link
    We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena. We use these representations as features to train a natural language inference (NLI) classifier based on datasets recast from existing semantic annotations. In applying this process to a representative NMT system, we find its encoder appears most suited to supporting inferences at the syntax-semantics interface, as compared to anaphora resolution requiring world-knowledge. We conclude with a discussion on the merits and potential deficiencies of the existing process, and how it may be improved and extended as a broader framework for evaluating semantic coverage.Comment: To be presented at NAACL 2018 - 11 page
    corecore