32,487 research outputs found
Using F-structures in machine translation evaluation
Despite a growing interest in automatic evaluation methods for Machine Translation (MT) quality, most existing automatic metrics are still limited to surface comparison of translation and reference strings. In this paper we
show how Lexical-Functional Grammar (LFG) labelled dependencies obtained from an automatic parse can be used to assess the quality of MT on a deeper linguistic level, giving as a result higher correlations with human judgements
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
A syntactic skeleton for statistical machine translation
We present a method for improving statistical machine translation performance by using linguistically motivated syntactic information. Our algorithm recursively decomposes source language sentences into syntactically simpler and shorter chunks, and recomposes their translation to form target language sentences. This improves both the word order and lexical selection of the translation. We report statistically significant relative improvementsof 3.3% BLEU score in an experiment (English!Spanish) carried out on
an 800-sentence test set extracted from the Europarl corpus
TransBooster: boosting the performance of wide-coverage machine translation systems
We propose the design, implementation and evaluation of a novel and modular approach to boost the translation performance of existing, wide-coverage, freely available
machine translation systems based on reliable and fast automatic decomposition of the translation input and corresponding composition of translation output. We provide details of our method, and experimental results compared to the MT systems SYSTRAN and Logomedia. While many avenues for further experimentation remain, to date we fall just
behind the baseline systems on the full 800-sentence testset, but in certain cases our method causes the translation quality obtained via the MT systems to improve
Disambiguation strategies for data-oriented translation
The Data-Oriented Translation (DOT) model { originally proposed in (Poutsma, 1998, 2003) and based on Data-Oriented Parsing (DOP) (e.g. (Bod, Scha, & Sima'an, 2003)) { is best described as a hybrid model of
translation as it combines examples, linguistic information and a statistical translation model. Although theoretically interesting, it inherits the computational complexity associated with DOP. In this paper, we focus on
one computational challenge for this model: efficiently selecting the `best' translation to output. We present four different disambiguation strategies in terms of how they are implemented in our DOT system, along with experiments
which investigate how they compare in terms of accuracy and
efficiency
Software Usability:A Comparison Between Two Tree-Structured Data Transformation Languages
This paper presents the results of a software usability study, involving both subjective and objective evaluation. It compares a popular XML data transformation language (XSLT) and a general purpose rule-based tree manipulation language which addresses some of the XML and XSLT limitations. The benefits of the evaluation study are discussed
- âŠ