59,134 research outputs found
Predicting Perfect Quality Segments in MT Output with Fine-Tuned OpenAI LLM: Is it possible to capture editing distance patterns from historical data?
Translation Quality Estimation (TQE) is an essential step before deploying
the output translation into usage. TQE is also critical in assessing machine
translation (MT) and human translation (HT) quality without seeing the
reference translations. This work examines whether the state-of-the-art large
language models (LLMs) can be fine-tuned for the TQE task and their capability.
We take ChatGPT as one example and approach TQE as a binary classification
task. Using \textbf{eight language pairs} including English to Italian, German,
French, Japanese, Dutch, Portuguese, Turkish, and Chinese training corpora, our
experimental results show that fine-tuned ChatGPT via its API can achieve a
relatively high score on predicting translation quality, i.e. \textit{if the
translation needs to be edited}. However, there is definitely much space to
improve the model accuracy, e.g. they are 82.42\% and 83.69\% for
English-Italian and English-German respectively using our experimental
settings. English-Italiano bilingual Abstract is available in the paper.Comment: 8 pages, 11 figures, under-review to ItalianNLP-202
Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation
This paper evaluates the different tasks carried out in the translation of
pronominal anaphora in a machine translation (MT) system. The MT interlingua
approach named AGIR (Anaphora Generation with an Interlingua Representation)
improves upon other proposals presented to date because it is able to translate
intersentential anaphors, detect co-reference chains, and translate Spanish
zero pronouns into English---issues hardly considered by other systems. The
paper presents the resolution and evaluation of these anaphora problems in AGIR
with the use of different kinds of knowledge (lexical, morphological,
syntactic, and semantic). The translation of English and Spanish anaphoric
third-person personal pronouns (including Spanish zero pronouns) into the
target language has been evaluated on unrestricted corpora. We have obtained a
precision of 80.4% and 84.8% in the translation of Spanish and English
pronouns, respectively. Although we have only studied the Spanish and English
languages, our approach can be easily extended to other languages such as
Portuguese, Italian, or Japanese
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
- …