3 research outputs found

    Investigating the Relationship between Classification Quality and SMT Performance in Discriminative Reordering Models

    Get PDF
    Reordering is one of the most important factors affecting the quality of the output in statistical machine translation (SMT). A considerable number of approaches that proposed addressing the reordering problem are discriminative reordering models (DRM). The core component of the DRMs is a classifier which tries to predict the correct word order of the sentence. Unfortunately, the relationship between classification quality and ultimate SMT performance has not been investigated to date. Understanding this relationship will allow researchers to select the classifier that results in the best possible MT quality. It might be assumed that there is a monotonic relationship between classification quality and SMT performance, i.e., any improvement in classification performance will be monotonically reflected in overall SMT quality. In this paper, we experimentally show that this assumption does not always hold, i.e., an improvement in classification performance might actually degrade the quality of an SMT system, from the point of view of MT automatic evaluation metrics. However, we show that if the improvement in the classification performance is high enough, we can expect the SMT quality to improve as well. In addition to this, we show that there is a negative relationship between classification accuracy and SMT performance in imbalanced parallel corpora. For these types of corpora, we provide evidence that, for the evaluation of the classifier, macro-averaged metrics such as macro-averaged F-measure are better suited than accuracy, the metric commonly used to date

    A Machine-Aided Approach to Generating Grammar Rules from Japanese Source Text for Use in Hybrid and Rule-based Machine Translation Systems

    Get PDF
    Many automatic machine translation systems available today use a hybrid of pure statistical translation and rule-based grammatical translations. This is largely due to the shortcomings of each individual approach, requiring a large amount of time for linguistics experts to hand-code grammar rules for a rule-based system and requiring large amounts of source text to generate accurate statistical models. By automating a portion of the rule generation process, the creation of grammar rules could be made to be faster, more efficient and less costly. By doing statistical analysis on a bilingual corpus, common grammar rules can be inferred and exported to a hybrid system. The resulting rules then provide a base grammar for the system. This helps to reduce the time needed for experts to hand-code grammar rules and make a hybrid system more effective

    A Unified Model for Soft Linguistic Reordering Constraints in Statistical Machine Translation

    No full text
    Abstract This paper explores a simple and effective unified framework for incorporating soft linguistic reordering constraints into a hierarchical phrase-based translation system: 1) a syntactic reordering model that explores reorderings for context free grammar rules; and 2) a semantic reordering model that focuses on the reordering of predicate-argument structures. We develop novel features based on both models and use them as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system. However, the gain achieved by the semantic reordering model is limited in the presence of the syntactic reordering model, and we therefore provide a detailed analysis of the behavior differences between the two
    corecore