9 research outputs found
The MateCat tool
Abstract We present a new web-based CAT tool providing translators with a professional work environment, integrating translation memories, terminology bases, concordancers, and machine translation. The tool is completely developed as open source software and has been already successfully deployed for business, research and education. The MateCat Tool represents today probably the best available open source platform for investigating, integrating, and evaluating under realistic conditions the impact of new machine translation technology on human post-editing
Comparing translator acceptability of TM and SMT outputs
This paper reports on an initial study that aims to understand whether the acceptability
of translation memory (TM) among translators when contrasted with machine translation (MT)
unacceptability is based on usersâ ability to optimise precision in match suggestions. Seven
translators were asked to rate whether 60 English-German translated segments were a usable basis
for a good target translation. 30 segments were from a domain-appropriate TM without a quality
threshold being set, and 30 segments were translated by a general domain statistical MT system.
Participants found the MT output more useful on average, with only TM fuzzy matches of over
90% considered more useful. This result suggests that, were the MT community able to provide an
accurate quality threshold to users, they would consider MT to be the more useful technology
DNN adaptation by automatic quality estimation of ASR hypotheses
In this paper we propose to exploit the automatic Quality Estimation (QE) of
ASR hypotheses to perform the unsupervised adaptation of a deep neural network
modeling acoustic probabilities. Our hypothesis is that significant
improvements can be achieved by: i)automatically transcribing the evaluation
data we are currently trying to recognise, and ii) selecting from it a subset
of "good quality" instances based on the word error rate (WER) scores predicted
by a QE component. To validate this hypothesis, we run several experiments on
the evaluation data sets released for the CHiME-3 challenge. First, we operate
in oracle conditions in which manual transcriptions of the evaluation data are
available, thus allowing us to compute the "true" sentence WER. In this
scenario, we perform the adaptation with variable amounts of data, which are
characterised by different levels of quality. Then, we move to realistic
conditions in which the manual transcriptions of the evaluation data are not
available. In this case, the adaptation is performed on data selected according
to the WER scores "predicted" by a QE component. Our results indicate that: i)
QE predictions allow us to closely approximate the adaptation results obtained
in oracle conditions, and ii) the overall ASR performance based on the proposed
QE-driven adaptation method is significantly better than the strong, most
recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201
An Unsupervised Method for Automatic Translation Memory Cleaning.
We address the problem of automatically
cleaning a large-scale Translation Memory
(TM) in a fully unsupervised fashion,
i.e. without human-labelled data.
We approach the task by: i) designing
a set of features that capture the similarity
between two text segments in different
languages, ii) use them to induce reliable
training labels for a subset of the
translation units (TUs) contained in the
TM, and iii) use the automatically labelled
data to train an ensemble of binary classifiers.
We apply our method to clean a
test set composed of 1,000 TUs randomly
extracted from the English-Italian version
of MyMemory, the worldâs largest public
TM. Our results show competitive performance
not only against a strong baseline
that exploits machine translation, but also
against a state-of-the-art method that relies
on human-labelled data
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
Empowering NGOs in Countering Online Hate Messages
Studies on online hate speech have mostly focused on the automated detection
of harmful messages. Little attention has been devoted so far to the
development of effective strategies to fight hate speech, in particular through
the creation of counter-messages. While existing manual scrutiny and
intervention strategies are time-consuming and not scalable, advances in
natural language processing have the potential to provide a systematic approach
to hatred management. In this paper, we introduce a novel ICT platform that NGO
operators can use to monitor and analyze social media data, along with a
counter-narrative suggestion tool. Our platform aims at increasing the
efficiency and effectiveness of operators' activities against islamophobia. We
test the platform with more than one hundred NGO operators in three countries
through qualitative and quantitative evaluation. Results show that NGOs favor
the platform solution with the suggestion tool, and that the time required to
produce counter-narratives significantly decreases.Comment: Preprint of the paper published in Online Social Networks and Media
Journal (OSNEM
Exploring responsible applications of Synthetic Data to advance Online Safety Research and Development
The use of synthetic data provides an opportunity to accelerate online safety
research and development efforts while showing potential for bias mitigation,
facilitating data storage and sharing, preserving privacy and reducing exposure
to harmful content. However, the responsible use of synthetic data requires
caution regarding anticipated risks and challenges. This short report explores
the potential applications of synthetic data to the domain of online safety,
and addresses the ethical challenges that effective use of the technology may
present
Coping with the subjectivity of human judgements in mt quality estimation
Supervised approaches to NLP tasks rely on high-quality data annotations, which typically result from expensive manual labelling procedures. For some tasks, however, the subjectivity of human judgements might reduce the usefulness of the annotation for real-world applications. In Machine Translation (MT) Quality Estimation (QE), for instance, using humanannotated data to train a binary classifier that discriminates between good (useful for a post-editor) and bad translations is not trivial. Focusing on this binary task, we show that subjective human judgements can be effectively replaced with an automatic annotation procedure. To this aim, we compare binary classifiers trained on different data: the human-annotated dataset from the 7th Workshop on Statistical Machine Translation (WMT-12), and an automatically labelled version of the same corpus. Our results show that human labels are less suitable for the task.
Coping with the Subjectivity of Human Judgements in MT Quality Estimation
Supervised approaches to NLP tasks rely on high-quality data annotations, which typically result from expensive manual la- belling procedures. For some tasks, how- ever, the subjectivity of human judgements might reduce the usefulness of the an- notation for real-world applications. In Machine Translation (MT) Quality Estimation (QE), for instance, using human- annotated data to train a binary classifier that discriminates between good (useful for a post-editor) and bad translations is not trivial. Focusing on this binary task, we show that subjective human judgements can be effectively replaced with an automatic annotation procedure. To this aim, we compare binary classifiers trained on different data: the human-annotated dataset from the 7th Workshop on Statistical Machine Translation (WMT-12), and an automatically labelled version of the same corpus. Our results show that human labels are less suitable for the task