1,289 research outputs found
Improving Evaluation of English-Czech MT through Paraphrasing
In this paper, we present a method of improving the accuracy of machine translation
evaluation of Czech sentences. Given a reference sentence, our algorithm transforms it
by targeted paraphrasing into a new synthetic reference sentence that is closer in
wording to the machine translation output, but at the same time preserves the meaning of
the original reference sentence.
Grammatical correctness of~the new reference sentence is provided by applying Depfix on
newly created paraphrases. Depfix is a system for post-editing English-to-Czech machine
translation outputs. We adjusted it to fix the errors in paraphrased sentences.
Due to a noisy source of our paraphrases, we experiment with adding word alignment. However,
the alignment reduces the number of paraphrases found and the best results were achieved
by~a~simple greedy method with only one-word paraphrases thanks to their intensive filtering.
BLEU scores computed using these new reference sentences show significantly higher correlation
with human judgment than scores computed on the original reference sentences
DEPFIX: A System for Automatic Correction of Czech MT Outputs
We present an improved version of DEPFIX, a system for automatic rule-based post-processing of English-to-Czech MT outputs designed to increase their fluency. We enhanced the rule set used by the original DEPFIX system and measured
the performance of the individual rules.
We also modified the dependency parser of
McDonald et al. (2005) in two ways to adjust
it for the parsing of MT outputs. We show that
our system is able to improve the quality of the
state-of-the-art MT systems
Measuring Memorization Effect in Word-Level Neural Networks Probing
Multiple studies have probed representations emerging in neural networks
trained for end-to-end NLP tasks and examined what word-level linguistic
information may be encoded in the representations. In classical probing, a
classifier is trained on the representations to extract the target linguistic
information. However, there is a threat of the classifier simply memorizing the
linguistic labels for individual words, instead of extracting the linguistic
abstractions from the representations, thus reporting false positive results.
While considerable efforts have been made to minimize the memorization problem,
the task of actually measuring the amount of memorization happening in the
classifier has been understudied so far. In our work, we propose a simple
general method for measuring the memorization effect, based on a symmetric
selection of comparable sets of test words seen versus unseen in training. Our
method can be used to explicitly quantify the amount of memorization happening
in a probing setup, so that an adequate setup can be chosen and the results of
the probing can be interpreted with a reliability estimate. We exemplify this
by showcasing our method on a case study of probing for part of speech in a
trained neural machine translation encoder.Comment: Accepted to TSD 2020. Will be published in Springer LNC
Universal Dependencies according to BERT: both more specific and more general
This work focuses on analyzing the form and extent of syntactic abstraction
captured by BERT by extracting labeled dependency trees from self-attentions.
Previous work showed that individual BERT heads tend to encode particular
dependency relation types. We extend these findings by explicitly comparing
BERT relations to Universal Dependencies (UD) annotations, showing that they
often do not match one-to-one.
We suggest a method for relation identification and syntactic tree
construction. Our approach produces significantly more consistent dependency
trees than previous work, showing that it better explains the syntactic
abstractions in BERT. At the same time, it can be successfully applied with
only a minimal amount of supervision and generalizes well across languages
THEaiTRE: Umělá inteligence pà e divadelnà hru
summary:V článku pĹ™edstavĂme projekt THEaiTRE, kterĂ˝ si dává za cĂl automaticky vygenerovat scĂ©nář divadelnĂ hry. PodĂváme se, jak to dÄ›láme, jak se nám to zatĂm dařà a na jakĂ© problĂ©my narážĂme
MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service
We present a web service which handles and distributes JSON-encoded HTTP
requests for machine translation (MT) among multiple machines running
an MT system, including text pre- and post processing.
It is currently used to provide MT between several languages
for cross-lingual information retrieval in the Khresmoi project.
The software consists of an application server and remote workers which handle
text processing and communicate translation requests to MT
systems. The communication between the application server and the workers is
based on the XML-RPC protocol. We present
the overall design of the software and test results which document
speed and scalability of our solution.
Our software is licensed under the Apache 2.0 licence and is available for
download from the Lindat-Clarin repository and Github
- …