Search CORE

51 research outputs found

An open source rule induction tool for transfer-based SMT

Author: Graham Yvette
van Genabith Josef
Publication venue: Charles University, Prague
Publication date: 01/01/2009
Field of study

In this paper we describe an open source tool for automatic induction of transfer rules. Transfer rule induction is carried out on pairs of dependency structures and their node alignment to produce all rules consistent with the node alignment. We describe an efficient algorithm for rule induction and give a detailed description of how to use the tool

Irish Universities

DCU Online Research Access Service

F-structure transfer-based statistical machine translation

Author: Bryl Anton
Graham Yvette
van Genabith Josef
Publication venue: CSLI Publications
Publication date: 01/01/2009
Field of study

In this paper, we describe a statistical deep syntactic transfer decoder that is trained fully automatically on parsed bilingual corpora. Deep syntactic transfer rules are induced automatically from the f-structures of a LFG parsed bitext corpus by automatically aligning local f-structures, and inducing all rules consistent with the node alignment. The transfer decoder outputs the n-best TL f-structures given a SL f-structure as input by applying large numbers of transfer rules and searching for the best output using a log-linear model to combine feature scores. The decoder includes a fully integrated dependency-based tri-gram language model. We include an experimental evaluation of the decoder using different parsing disambiguation resources for the German data to provide a comparison of how the system performs with different German training and test parses

Irish Universities

DCU Online Research Access Service

Myanmar Phrases Translation Model with Morphological Analysis for Statistical Myanmar to English Translation System

Author: Soe Khin Mar
Thein Ni Lar
Zin Thet Thet
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

Discriminative Reordering Models for Statistical Machine Translation

Author: Hermann Ney
Richard Zens
Publication venue
Publication date: 01/01/2006
Field of study

We present discriminative reordering models for phrase-based statistical machine translation. The models are trained using the maximum entropy principle. We use several types of features: based on words, based on word classes, based on the local context. We evaluate the overall performance of the reordering models as well as the contribution of the individual feature types on a word-aligned corpus. Additionally, we show improved translation performance using these reordering models compared to a state-of-the-art baseline system.

CiteSeerX

Crossref

Publikationsserver der RWTH Aachen University

A tree-based approach for English-to-Turkish translation

Author: Avar Begüm
Bakay Özge
Yıldız Olcay Taner
Publication venue: 'The Scientific and Technological Research Council of Turkey'
Publication date: 01/01/2019
Field of study

In this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67% relative improvement from a baseline 12.8 to 21.4 BLEU, all averaged over 10-fold cross-validation. As future work, improvements in choosing the correct senses and structural rules are needed.This work was supported by TUBITAK project 116E104Publisher's Versio

Isik University Academic Open Access

Stochastic Modelling: From Pattern Classification to Speech Recognition and Language Translation

Author: AJ Robinson
AP Dempster
B Efron
F Jelinek
F Jelinek
H Bourlard
H Ney
H Ney
H Ney
H Ney
H Ney
L Breiman
LE Baum
LR Bahl
PF Brown
RO Duda
S Ortmanns
S Pietra Della
W Wahlster
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

This paper gives an overview of the stochastic modelling approach to machine translation. Starting with the Bayes decision rule as in pattern classification and speech recognition, we show how the resulting system architecture can be structured into three parts: the language model probability, the string translation model probability and the search procedure that gener-ates the word sequence in the target language. We discuss the properties of the system components and report results on the translation of spoken dialogues in the VERBMOBIL project. The experience obtained in the VERB-MOBIL project, in particular a large-scale end-to-end evaluation, showed that the stochastic modelling approach resulted in significantly lower error rates than three competing translation approaches: the sentence error rate was 29 % in comparison with 52 % to 62% for the other translation approaches.

CiteSeerX

Crossref

Discovering Phrases in Machine Translation by Simulated Annealing

Author: Langlois David
Lavecchia Caroline
Smaïli Kamel
Publication venue: HAL CCSD
Publication date: 22/09/2008
Field of study

International audienceIn this paper, we propose a new phrase-based translation model based on inter-lingual triggers. The originality of our method is double. First we identify common source. Then we use inter-lingual triggers in order to retrieve their translat ions. Furthermore, we consider the way of extracting phrase trans- lations as an optimization issue. For that we use simulated annealing algorithm to find out the best phrase translations among all those determined by inter-lingual triggers. The best phrases are those which improve the translation quality in terms of Bleu score. Tests are achieved on the proceedings of the European Parliament corpora. The training is made on a corpus containing 596K parallel sentences (French-English) and tests on a corpus of 1444 sentences. With only 8.1% of the identified source phrases occurring in the test corpus, our system overcomes the baseline model by almost 3 points

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Peningkatan Akurasi Penerjemah Bahasa Daerah dengan Optimasi Korpus Paralel

Author: Herry Sujaini
Publication venue: 'Universitas Gadjah Mada'
Publication date: 01/03/2018
Field of study

Statistical Machine Translation (SMT) quality is influenced by several factors. The most fundamental factor is quantity of corpus used as base material for building translational and language model in SMT. Quantity of corpus is a major factor in ensuring quality of the translation, but quality of corpus can not be ignored either. Checking the source and translation sentences manually in a parallel corpus of course will be very difficult and require a lot of resources. This paper reports the experimental results using a quality improvement strategy of Indonesian-Malay and Indonesia-Javanesse corpus without having to examine and correct the sentences that exist on the corpus. The filter used is the minimum value of each sentence tested by the Bilingual Evaluation Understudy (BLEU) method. Experimental results show that parallel corpus optimization can improve the level of accuracy of Indonesian-Malay translation by 6.97%and Indonesian-Javanesse translation by 5.55%

Directory of Open Access Journals