Search CORE

3 research outputs found

Ngram-based statistical machine translation enhanced with multiple weighted reordering hypotheses

Author: Jose ́ A. R. Fonollosa
Jose ́ B. Mari no
Josep M. Crego
Marta R. Costa-jussa
Maxim Khalilov
Patrik Lambert
Rafael E. Banchs
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

This paper describes the 2007 Ngram-based sta-tistical machine translation system developed at the TALP Research Center of the UPC (Uni-versitat Politecnica de Catalunya) in Barcelona. Emphasis is put on improvements and extensions of the previous years system, being highlighted and empirically compared. Mainly, these include a novel word ordering strategy based on: (1) sta-tistically monotonizing the training source cor-pus and (2) a novel reordering approach based on weighted reordering graphs. In addition, this system introduces a target language model based on statistical classes, a feature for out-of-domain units and an improved optimization procedure. The paper provides details of this system par-ticipation in the ACL 2007 SECOND WORK-SHOP ON STATISTICAL MACHINE TRANSLA-TION. Results on three pairs of languages are reported, namely from Spanish, French and Ger-man into English (and the other way round) for both the in-domain and out-of-domain tasks.

CiteSeerX

Crossref

How much hybridisation does machine translation need?

Author: Ruiz Costa-Jussà Marta
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

This is the peer reviewed version of the following article: [Costa-jussà, M. R. (2015), How much hybridization does machine translation Need?. J Assn Inf Sci Tec, 66: 2160–2165. doi:10.1002/asi.23517], which has been published in final form at [10.1002/asi.23517]. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.Rule-based and corpus-based machine translation (MT)have coexisted for more than 20 years. Recently, bound-aries between the two paradigms have narrowed andhybrid approaches are gaining interest from bothacademia and businesses. However, since hybridapproaches involve the multidisciplinary interaction oflinguists, computer scientists, engineers, and informa-tion specialists, understandably a number of issuesexist.While statistical methods currently dominate researchwork in MT, most commercial MT systems are techni-cally hybrid systems. The research community shouldinvestigate the bene¿ts and questions surrounding thehybridization of MT systems more actively. This paperdiscusses various issues related to hybrid MT includingits origins, architectures, achievements, and frustra-tions experienced in the community. It can be said thatboth rule-based and corpus- based MT systems havebene¿ted from hybridization when effectively integrated.In fact, many of the current rule/corpus-based MTapproaches are already hybridized since they do includestatistics/rules at some point.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC