Search CORE

18 research outputs found

Evaluation of Automatic Video Captioning Using Direct Assessment

Author: Awad George
Graham Yvette
Smeaton Alan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 29/10/2017
Field of study

We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for any given video clip there is no definitive ground truth or correct answer against which to measure. Automatic metrics for comparing automatic video captions against a manual caption such as BLEU and METEOR, drawn from techniques used in evaluating machine translation, were used in the TRECVid video captioning task in 2016 but these are shown to have weaknesses. The work presented here brings human assessment into the evaluation by crowdsourcing how well a caption describes a video. We automatically degrade the quality of some sample captions which are assessed manually and from this we are able to rate the quality of the human assessors, a factor we take into account in the evaluation. Using data from the TRECVid video-to-text task in 2016, we show how our direct assessment method is replicable and robust and should scale to where there many caption-generation techniques to be evaluated.Comment: 26 pages, 8 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

TŁUMACZENIE MASZYNOWE – CZY MOŻE WSPOMÓC PROFESJONALNY PRZEKŁAD UMÓW?

Author: SYCZ-OPOŃ Joanna
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date: 05/11/2016
Field of study

The aim of this research project is to verify whether machine translation (MT) technology can be utilized in the process of professional translation. The genre to be tested in this study is a legal contract. It is a non-literary text, with a high rate of repeatable phrases, predictable lexis, culture-bound terms and syntactically complex sentences (Šarčević 2000, Berezowski 2008). The subject of this study is MT software available on the market that supports the English-Polish language pair: Google MT and Microsoft MT. During the experiment, the process of post-editing of MT raw output was recorded and then analysed in order to retrieve the following data:(i) number of errors in MT raw output,(ii) types of errors (syntactic, grammatical, lexical) and their frequency,(iii) degree of fidelity to the original text (frequency of meaning omissions and meaning distortions), (iv) time devoted to the editing process of the MT raw output.The research results should help translators make an informed decision whether they would like to invite MT into their work environment.Niniejszy projekt badawczy ma na celu wykazanie czy jakość tłumaczenia maszynowego jest na tyle dobra, by mogło być ono wykorzystywane podczas pracy profesjonalnego tłumacza prawniczego. Podczas badania analizie poddane zostały umowy – teksty użytkowe charakteryzujące się wysoką powtarzalnością wyrażeń, zwrotów i terminów, złożoną składnią oraz nieprzystawalnością terminologiczną (Šarčević 2000, Berezowski 2008). Przyjęta metoda badawcza polegała na nagraniu procesu tłumaczenia przy zastosowaniu narzędzi Google MT oraz Microsoft MT. Badanie umożliwiło wydobycie informacji na temat użyteczności tłumaczenia maszynowego poprzez określenie: (i) rodzaju błędów występujących w tekście wygenerowanym przez tłumacza maszynowego,(ii) częstotliwości występowania błędów,(iii) zgodności merytorycznej z treścią oryginału (liczba pominięć oraz zniekształceń),(iv) czasu poświęconego na edycję tekstu wygenerowanego przez tłumacza maszynowego.Wyniki badania powinny pomóc tłumaczom w podjęciu świadomej decyzji czy chcieliby włączyć tłumaczenie maszynowe do swojego warsztatu pracy

Biblioteka Nauki - repozytorium artykuÅÃ³w

Crossref

Comparative Legilinguistics

Machine translation : can it assist in professional translation of contracts?

Author: Sycz-Opoń Joanna
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date: 01/01/2014
Field of study

Repozytorium Uniwersytetu Śląskiego RE-BUŚ

GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation

Author: Bragg Jonathan
Choi Yejin
Kasai Jungo
Khashabi Daniel
Lourie Nicholas
Smith Noah A.
Stanovsky Gabriel
Weld Daniel S.
Publication venue
Publication date: 11/06/2021
Field of study

Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited to tasks that can be reliably evaluated in an automatic manner. This work introduces GENIE, an extensible human evaluation leaderboard, which brings the ease of leaderboards to text generation tasks. GENIE automatically posts leaderboard submissions to crowdsourcing platforms asking human annotators to evaluate them on various axes (e.g., correctness, conciseness, fluency) and compares their answers to various automatic metrics. We introduce several datasets in English to GENIE, representing four core challenges in text generation: machine translation, summarization, commonsense reasoning, and machine comprehension. We provide formal granular evaluation metrics and identify areas for future research. We make GENIE publicly available and hope that it will spur progress in language generation models as well as their automatic and manual evaluation

arXiv.org e-Print Archive