3,093 research outputs found
Chapter Bibliography
authored support system; contextual machine translation; controlled document authoring; controlled language; document structure; terminology management; translation technology; usability evaluatio
Evaluating MT for massive open online courses: a multifaceted comparison between PBSMT and NMT systems
This article reports a multifaceted comparison between statistical
and neural machine translation (MT) systems that were developed for translation of data from Massive Open Online Courses (MOOCs). The study uses four
language pairs: English to German, Greek, Portuguese, and Russian. Translation quality is evaluated using automatic metrics and human evaluation, carried out by professional translators. Results show that neural MT is preferred
in side-by-side ranking, and is found to contain fewer overall errors. Results
are less clear-cut for some error categories, and for temporal and technical
post-editing effort. In addition, results are reported based on sentence length,
showing advantages and disadvantages depending on the particular language
pair and MT paradigm
Evaluating MT for massive open online courses
This article reports a multifaceted comparison between statistical and neural
machine translation (MT) systems that were developed for translation of data from
massive open online courses (MOOCs). The study uses four language pairs: English to
German, Greek, Portuguese, and Russian. Translation quality is evaluated using automatic
metrics and human evaluation, carried out by professional translators. Results
show that neuralMTis preferred in side-by-side ranking, and is found to contain fewer
overall errors. Results are less clear-cut for some error categories, and for temporal
and technical post-editing effort. In addition, results are reported based on sentence
length, showing advantages and disadvantages depending on the particular language
pair and MT paradigm
Findings of the 2017 Conference on Machine Translation
This paper presents the results of the
WMT17 shared tasks, which included
three machine translation (MT) tasks
(news, biomedical, and multimodal), two
evaluation tasks (metrics and run-time estimation
of MT quality), an automatic
post-editing task, a neural MT training
task, and a bandit learning task
Article Segmentation in Digitised Newspapers
Digitisation projects preserve and make available vast quantities of historical text. Among these, newspapers are an invaluable resource for the study of human culture and history. Article segmentation identifies each region in a digitised newspaper page that contains an article. Digital humanities, information retrieval (IR), and natural language processing (NLP) applications over digitised archives improve access to text and allow automatic information extraction. The lack of article segmentation impedes these applications. We contribute a thorough review of the existing approaches to article segmentation. Our analysis reveals divergent interpretations of the task, and inconsistent and often ambiguously defined evaluation metrics, making comparisons between systems challenging. We solve these issues by contributing a detailed task definition that examines the nuances and intricacies of article segmentation that are not immediately apparent. We provide practical guidelines on handling borderline cases and devise a new evaluation framework that allows insightful comparison of existing and future approaches. Our review also reveals that the lack of large datasets hinders meaningful evaluation and limits machine learning approaches. We solve these problems by contributing a distant supervision method for generating large datasets for article segmentation. We manually annotate a portion of our dataset and show that our method produces article segmentations over characters nearly as well as costly human annotators. We reimplement the seminal textual approach to article segmentation (Aiello and Pegoretti, 2006) and show that it does not generalise well when evaluated on a large dataset. We contribute a framework for textual article segmentation that divides the task into two distinct phases: block representation and clustering. We propose several techniques for block representation and contribute a novel highly-compressed semantic representation called similarity embeddings. We evaluate and compare different clustering techniques, and innovatively apply label propagation (Zhu and Ghahramani, 2002) to spread headline labels to similar blocks. Our similarity embeddings and label propagation approach substantially outperforms Aiello and Pegoretti but still falls short of human performance. Exploring visual approaches to article segmentation, we reimplement and analyse the state-of-the-art Bansal et al. (2014) approach. We contribute an innovative 2D Markov model approach that captures reading order dependencies and reduces the structured labelling problem to a Markov chain that we decode with Viterbi (1967). Our approach substantially outperforms Bansal et al., achieves accuracy as good as human annotators, and establishes a new state of the art in article segmentation. Our task definition, evaluation framework, and distant supervision dataset will encourage progress in the task of article segmentation. Our state-of-the-art textual and visual approaches will allow sophisticated IR and NLP applications over digitised newspaper archives, supporting research in the digital humanities
An investigation of challenges in machine translation of literary texts : the case of the EnglishâChinese language pair
In the absence of a focus on literary text translation in studies of machine translation (MT), this study aims at investigating some challenges of this application of the technology. First, the most commonly used types of MT are reviewed in chronological order of their development, and, for the purpose of identifying challenges for MT in literary text translation, the challenges human translators face in literary text translation are linked to corresponding aspects of MT. In investigating the research questions of the challenges that MT systems face in literary text translation, and whether equivalence can be established by MT in literary text translation, a qualitative method is used. Areas such as the challenges for MT in the establishment of corpora, achieving equivalence, and realisation of creativity in literary texts are examined in order to reveal some of the potential contributing factors to the difficulties faced in literary text translation by MT. Through text analysis on chosen sample literary texts on three online MT platforms (Google Translate, DeepL and Youdao Translate), all based on highly advanced neural machine translation engines, this study offers a pragmatic view on some challenging areas in literary text translation using these widely acclaimed online platforms, and offers insights on potential research opportunities in studies of literary text translation using MT
Automatic Summarization
It has now been 50 years since the publication of Luhnâs seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field
- âŠ