3,091 research outputs found
Referential translation machines for predicting translation quality and related statistics
We use referential translation machines (RTMs) for predicting translation performance. RTMs pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain specific information or resource. We improve our RTM models with the ParFDA instance selection model (Bicici et al., 2015), with additional features for predicting the translation performance, and with improved learning models. We develop RTM models for each WMT15 QET (QET15) subtask and obtain improvements over QET14 results.
RTMs achieve top performance in QET15 ranking 1st in document- and sentence-level prediction tasks and 2nd in word-level prediction task
RTM-DCU: referential translation machines for semantic similarity
We use referential translation machines (RTMs) for predicting the semantic similarity of text.
RTMs are a computational model for identifying the
translation acts between any two data sets with respect to interpretants selected in the same domain,
which are effective when making monolingual and bilingual similarity judgments.
RTMs judge the quality or the semantic similarity of text by using retrieved relevant training data as interpretants for reaching shared semantics.
We derive features measuring the closeness of the test sentences to the training data via interpretants, the difficulty of translating them, and the presence of the acts of translation, which may ubiquitously be observed in communication.
RTMs provide a language independent approach to all similarity tasks and achieve top performance when predicting monolingual cross-level semantic similarity (Task 3) and good results in semantic relatedness and entailment (Task 1) and multilingual semantic textual similarity (STS) (Task 10). RTMs remove the need to access any task or domain specific information or resource
Referential translation machines for quality estimation
We introduce referential translation machines (RTM) for quality estimation of translation outputs. RTMs are a computational model for identifying the translation acts between any two data sets with respect to a reference corpus selected in the same domain, which can be used for estimating the quality of translation outputs, judging the semantic similarity between text, and evaluating the quality of student answers. RTMs achieve top performance in automatic, accurate, and language independent prediction of
sentence-level and word-level statistical machine translation (SMT) quality. RTMs remove the need to access any SMT system specific information or prior knowledge of the training data or models used when generating the translations. We develop novel techniques for solving all subtasks in the WMT13 quality estimation (QE) task (QET 2013) based on individual RTM models. Our results achieve improvements over last year’s QE task results (QET 2012), as well as our previous results, provide new features and techniques for QE, and rank 1st or 2nd in all of the subtasks
Structural Features for Predicting the Linguistic Quality of Text: Applications to Machine Translation, Automatic Summarization and Human-Authored Text
Sentence structure is considered to be an important component of the overall linguistic quality of text. Yet few empirical studies have sought to characterize how and to what extent structural features determine fluency and linguistic quality. We report the results of experiments on the predictive power of syntactic phrasing statistics and other structural features for these aspects of text. Manual assessments of sentence fluency for machine translation evaluation and text quality for summarization evaluation are used as gold-standard. We find that many structural features related to phrase length are weakly but significantly correlated with fluency and classifiers based on the entire suite of structural features can achieve high accuracy in pairwise comparison of sentence fluency and in distinguishing machine translations from human translations. We also test the hypothesis that the learned models capture general fluency properties applicable to human-authored text. The results from our experiments do not support the hypothesis. At the same time structural features and models based on them prove to be robust for automatic evaluation of the linguistic quality of multi-document summaries
Automatic Accuracy Prediction for AMR Parsing
Abstract Meaning Representation (AMR) represents sentences as directed,
acyclic and rooted graphs, aiming at capturing their meaning in a machine
readable format. AMR parsing converts natural language sentences into such
graphs. However, evaluating a parser on new data by means of comparison to
manually created AMR graphs is very costly. Also, we would like to be able to
detect parses of questionable quality, or preferring results of alternative
systems by selecting the ones for which we can assess good quality. We propose
AMR accuracy prediction as the task of predicting several metrics of
correctness for an automatically generated AMR parse - in absence of the
corresponding gold parse. We develop a neural end-to-end multi-output
regression model and perform three case studies: firstly, we evaluate the
model's capacity of predicting AMR parse accuracies and test whether it can
reliably assign high scores to gold parses. Secondly, we perform parse
selection based on predicted parse accuracies of candidate parses from
alternative systems, with the aim of improving overall results. Finally, we
predict system ranks for submissions from two AMR shared tasks on the basis of
their predicted parse accuracy averages. All experiments are carried out across
two different domains and show that our method is effective.Comment: accepted at *SEM 201
Self-Supervised Sketch-to-Image Synthesis
Imagining a colored realistic image from an arbitrarily drawn sketch is one
of the human capabilities that we eager machines to mimic. Unlike previous
methods that either requires the sketch-image pairs or utilize low-quantity
detected edges as sketches, we study the exemplar-based sketch-to-image (s2i)
synthesis task in a self-supervised learning manner, eliminating the necessity
of the paired sketch data. To this end, we first propose an unsupervised method
to efficiently synthesize line-sketches for general RGB-only datasets. With the
synthetic paired-data, we then present a self-supervised Auto-Encoder (AE) to
decouple the content/style features from sketches and RGB-images, and
synthesize images that are both content-faithful to the sketches and
style-consistent to the RGB-images. While prior works employ either the
cycle-consistence loss or dedicated attentional modules to enforce the
content/style fidelity, we show AE's superior performance with pure
self-supervisions. To further improve the synthesis quality in high resolution,
we also leverage an adversarial network to refine the details of synthetic
images. Extensive experiments on 1024*1024 resolution demonstrate a new
state-of-art-art performance of the proposed model on CelebA-HQ and Wiki-Art
datasets. Moreover, with the proposed sketch generator, the model shows a
promising performance on style mixing and style transfer, which require
synthesized images to be both style-consistent and semantically meaningful. Our
code is available on
https://github.com/odegeasslbc/Self-Supervised-Sketch-to-Image-Synthesis-PyTorch,
and please visit https://create.playform.io/my-projects?mode=sketch for an
online demo of our model.Comment: AAAI-202
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
Deep Learning: Our Miraculous Year 1990-1991
In 2020, we will celebrate that many of the basic ideas behind the deep
learning revolution were published three decades ago within fewer than 12
months in our "Annus Mirabilis" or "Miraculous Year" 1990-1991 at TU Munich.
Back then, few people were interested, but a quarter century later, neural
networks based on these ideas were on over 3 billion devices such as
smartphones, and used many billions of times per day, consuming a significant
fraction of the world's compute.Comment: 37 pages, 188 references, based on work of 4 Oct 201
Findings of the WMT 2018 shared task on quality estimation
© 2018 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence.
The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/W18-6451We report the results of the WMT18 shared task on Quality Estimation, i.e. the task of predicting the quality of the output of machine translation systems at various granularity levels: word, phrase, sentence and document. This year we include four language pairs, three text domains, and translations produced by both statistical and neural machine translation systems. Participating teams from ten institutions submitted a variety of systems to different task variants and language pairs.The data and annotations collected for Tasks 1, 2 and 3 was supported by the EC H2020 QT21 project (grant agreement no. 645452). The shared task organisation was also supported by the QT21 project, national funds through Fundacao para a Ciencia e Tecnologia (FCT), with references UID/CEC/50021/2013 and UID/EEA/50008/2013, and by the European Research Council (ERC StG DeepSPIN 758969). We would also like to thank Julie Beliao and the Unbabel Quality Team for coordinating the annotation of the dataset used in Task 4
- …