Search CORE

9 research outputs found

The MateCat tool

Author: A Cattelan
A Farina
A Martines
A Massidda
C Buck
D Lupinetti
F Blain
H Schwenk
Italy
L Barrault
M Cettolo
M Federico
M Negri
M Turchi Fondazione
N Bertoldi
P Koehn
Roma
Translated Srl
Trento (Italy) M. Trombetti Bruno Kessler
U Germann
Publication venue
Publication date: 01/01/2014
Field of study

Abstract We present a new web-based CAT tool providing translators with a professional work environment, integrating translation memories, terminology bases, concordancers, and machine translation. The tool is completely developed as open source software and has been already successfully deployed for business, research and education. The MateCat Tool represents today probably the best available open source platform for investigating, integrating, and evaluating under realistic conditions the impact of new machine translation technology on human post-editing

CiteSeerX

Comparing translator acceptability of TM and SMT outputs

Author: Moorkens Joss
Way Andy
Publication venue: Latvijas Universitate
Publication date: 01/01/2016
Field of study

This paper reports on an initial study that aims to understand whether the acceptability of translation memory (TM) among translators when contrasted with machine translation (MT) unacceptability is based on users’ ability to optimise precision in match suggestions. Seven translators were asked to rate whether 60 English-German translated segments were a usable basis for a good target translation. 30 segments were from a domain-appropriate TM without a quality threshold being set, and 30 segments were translated by a general domain statistical MT system. Participants found the MT output more useful on average, with only TM fuzzy matches of over 90% considered more useful. This result suggests that, were the MT community able to provide an accurate quality threshold to users, they would consider MT to be the more useful technology

Irish Universities

DCU Online Research Access Service

DNN adaptation by automatic quality estimation of ASR hypotheses

Author: Falavigna Daniele
Jalalvand Shahab
Matassoni Marco
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the "true" sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores "predicted" by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

An Unsupervised Method for Automatic Translation Memory Cleaning.

Author: Eduard Barbu
Marco Turchi
Masoud Jalili Sabet
Matteo Negri
Publication venue: The Association for Computational Linguistics
Publication date
Field of study

We address the problem of automatically cleaning a large-scale Translation Memory (TM) in a fully unsupervised fashion, i.e. without human-labelled data. We approach the task by: i) designing a set of features that capture the similarity between two text segments in different languages, ii) use them to induce reliable training labels for a subset of the translation units (TUs) contained in the TM, and iii) use the automatically labelled data to train an ensemble of binary classifiers. We apply our method to clean a test set composed of 1,000 TUs randomly extracted from the English-Italian version of MyMemory, the world’s largest public TM. Our results show competitive performance not only against a strong baseline that exploits machine translation, but also against a state-of-the-art method that relies on human-labelled data

Archivio della ricerca - Fondazione Bruno Kessler

Automatic Quality Estimation for ASR System Combination

Author: Falavigna Daniele
Jalalvand Shahab
Matassoni Marco
Negri Matteo
Turchi Marco
Publication venue: 'Elsevier BV'
Publication date: 22/06/2017
Field of study

Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to over estimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypotheses coming from independent ASR systems and multi-microphone recordings. In both tasks, it is assumed that the ASR decoder information is not available. The proposed approach significantly outperforms standard ROVER and it is competitive with two strong oracles that e xploit prior knowledge about the real quality of the hypotheses to be combined. Compared to standard ROVER, the abs olute WER improvements in the two evaluation scenarios range from 0.5% to 7.3%

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Empowering NGOs in Countering Online Hate Messages

Author: Chung Yi-Ling
Guerini Marco
Tekiroglu Serra Sinem
Tonelli Sara
Publication venue: 'Elsevier BV'
Publication date: 01/07/2021
Field of study

Studies on online hate speech have mostly focused on the automated detection of harmful messages. Little attention has been devoted so far to the development of effective strategies to fight hate speech, in particular through the creation of counter-messages. While existing manual scrutiny and intervention strategies are time-consuming and not scalable, advances in natural language processing have the potential to provide a systematic approach to hatred management. In this paper, we introduce a novel ICT platform that NGO operators can use to monitor and analyze social media data, along with a counter-narrative suggestion tool. Our platform aims at increasing the efficiency and effectiveness of operators' activities against islamophobia. We test the platform with more than one hundred NGO operators in three countries through qualitative and quantitative evaluation. Results show that NGOs favor the platform solution with the suggestion tool, and that the time required to produce counter-narratives significantly decreases.Comment: Preprint of the paper published in Online Social Networks and Media Journal (OSNEM

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Exploring responsible applications of Synthetic Data to advance Online Safety Research and Development

Author: Bright Jonathan
Fischer Claudia
Johansson Pica
Krishna Shyam
Leslie David
Publication venue
Publication date: 07/02/2024
Field of study

The use of synthetic data provides an opportunity to accelerate online safety research and development efforts while showing potential for bias mitigation, facilitating data storage and sharing, preserving privacy and reducing exposure to harmful content. However, the responsible use of synthetic data requires caution regarding anticipated risks and challenges. This short report explores the potential applications of synthetic data to the domain of online safety, and addresses the ethical challenges that effective use of the technology may present

arXiv.org e-Print Archive

Coping with the subjectivity of human judgements in mt quality estimation

Author: Fondazione Bruno Kessler
Marcello Federico
Marco Turchi
Matteo Negri
Publication venue
Publication date: 01/01/2013
Field of study

Supervised approaches to NLP tasks rely on high-quality data annotations, which typically result from expensive manual labelling procedures. For some tasks, however, the subjectivity of human judgements might reduce the usefulness of the annotation for real-world applications. In Machine Translation (MT) Quality Estimation (QE), for instance, using humanannotated data to train a binary classifier that discriminates between good (useful for a post-editor) and bad translations is not trivial. Focusing on this binary task, we show that subjective human judgements can be effectively replaced with an automatic annotation procedure. To this aim, we compare binary classifiers trained on different data: the human-annotated dataset from the 7th Workshop on Statistical Machine Translation (WMT-12), and an automatically labelled version of the same corpus. Our results show that human labels are less suitable for the task.

CiteSeerX

Coping with the Subjectivity of Human Judgements in MT Quality Estimation

Author: Federico Marcello
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/01/2013
Field of study

Supervised approaches to NLP tasks rely on high-quality data annotations, which typically result from expensive manual la- belling procedures. For some tasks, how- ever, the subjectivity of human judgements might reduce the usefulness of the an- notation for real-world applications. In Machine Translation (MT) Quality Estimation (QE), for instance, using human- annotated data to train a binary classifier that discriminates between good (useful for a post-editor) and bad translations is not trivial. Focusing on this binary task, we show that subjective human judgements can be effectively replaced with an automatic annotation procedure. To this aim, we compare binary classifiers trained on different data: the human-annotated dataset from the 7th Workshop on Statistical Machine Translation (WMT-12), and an automatically labelled version of the same corpus. Our results show that human labels are less suitable for the task

Archivio della ricerca - Fondazione Bruno Kessler