51 research outputs found

    An open source rule induction tool for transfer-based SMT

    Get PDF
    In this paper we describe an open source tool for automatic induction of transfer rules. Transfer rule induction is carried out on pairs of dependency structures and their node alignment to produce all rules consistent with the node alignment. We describe an efficient algorithm for rule induction and give a detailed description of how to use the tool

    F-structure transfer-based statistical machine translation

    Get PDF
    In this paper, we describe a statistical deep syntactic transfer decoder that is trained fully automatically on parsed bilingual corpora. Deep syntactic transfer rules are induced automatically from the f-structures of a LFG parsed bitext corpus by automatically aligning local f-structures, and inducing all rules consistent with the node alignment. The transfer decoder outputs the n-best TL f-structures given a SL f-structure as input by applying large numbers of transfer rules and searching for the best output using a log-linear model to combine feature scores. The decoder includes a fully integrated dependency-based tri-gram language model. We include an experimental evaluation of the decoder using different parsing disambiguation resources for the German data to provide a comparison of how the system performs with different German training and test parses

    Myanmar Phrases Translation Model with Morphological Analysis for Statistical Myanmar to English Translation System

    Get PDF

    Discriminative Reordering Models for Statistical Machine Translation

    Get PDF
    We present discriminative reordering models for phrase-based statistical machine translation. The models are trained using the maximum entropy principle. We use several types of features: based on words, based on word classes, based on the local context. We evaluate the overall performance of the reordering models as well as the contribution of the individual feature types on a word-aligned corpus. Additionally, we show improved translation performance using these reordering models compared to a state-of-the-art baseline system.

    A tree-based approach for English-to-Turkish translation

    Get PDF
    In this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67% relative improvement from a baseline 12.8 to 21.4 BLEU, all averaged over 10-fold cross-validation. As future work, improvements in choosing the correct senses and structural rules are needed.This work was supported by TUBITAK project 116E104Publisher's Versio

    Stochastic Modelling: From Pattern Classification to Speech Recognition and Language Translation

    Full text link
    This paper gives an overview of the stochastic modelling approach to machine translation. Starting with the Bayes decision rule as in pattern classification and speech recognition, we show how the resulting system architecture can be structured into three parts: the language model probability, the string translation model probability and the search procedure that gener-ates the word sequence in the target language. We discuss the properties of the system components and report results on the translation of spoken dialogues in the VERBMOBIL project. The experience obtained in the VERB-MOBIL project, in particular a large-scale end-to-end evaluation, showed that the stochastic modelling approach resulted in significantly lower error rates than three competing translation approaches: the sentence error rate was 29 % in comparison with 52 % to 62% for the other translation approaches.

    Discovering Phrases in Machine Translation by Simulated Annealing

    Get PDF
    International audienceIn this paper, we propose a new phrase-based translation model based on inter-lingual triggers. The originality of our method is double. First we identify common source. Then we use inter-lingual triggers in order to retrieve their translat ions. Furthermore, we consider the way of extracting phrase trans- lations as an optimization issue. For that we use simulated annealing algorithm to find out the best phrase translations among all those determined by inter-lingual triggers. The best phrases are those which improve the translation quality in terms of Bleu score. Tests are achieved on the proceedings of the European Parliament corpora. The training is made on a corpus containing 596K parallel sentences (French-English) and tests on a corpus of 1444 sentences. With only 8.1% of the identified source phrases occurring in the test corpus, our system overcomes the baseline model by almost 3 points

    Peningkatan Akurasi Penerjemah Bahasa Daerah dengan Optimasi Korpus Paralel

    Get PDF
    Statistical Machine Translation (SMT) quality is influenced by several factors. The most fundamental factor is quantity of corpus used as base material for building translational and language model in SMT. Quantity of corpus is a major factor in ensuring quality of the translation, but quality of corpus can not be ignored either. Checking the source and translation sentences manually in a parallel corpus of course will be very difficult and require a lot of resources. This paper reports the experimental results using a quality improvement strategy of Indonesian-Malay and Indonesia-Javanesse corpus without having to examine and correct the sentences that exist on the corpus. The filter used is the minimum value of each sentence tested by the Bilingual Evaluation Understudy (BLEU) method. Experimental results show that parallel corpus optimization can improve the level of accuracy of Indonesian-Malay translation by 6.97%and Indonesian-Javanesse translation by 5.55%