18 research outputs found

    Evaluation of Automatic Video Captioning Using Direct Assessment

    Full text link
    We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for any given video clip there is no definitive ground truth or correct answer against which to measure. Automatic metrics for comparing automatic video captions against a manual caption such as BLEU and METEOR, drawn from techniques used in evaluating machine translation, were used in the TRECVid video captioning task in 2016 but these are shown to have weaknesses. The work presented here brings human assessment into the evaluation by crowdsourcing how well a caption describes a video. We automatically degrade the quality of some sample captions which are assessed manually and from this we are able to rate the quality of the human assessors, a factor we take into account in the evaluation. Using data from the TRECVid video-to-text task in 2016, we show how our direct assessment method is replicable and robust and should scale to where there many caption-generation techniques to be evaluated.Comment: 26 pages, 8 figure

    TŁUMACZENIE MASZYNOWE – CZY MOŻE WSPOMÓC PROFESJONALNY PRZEKŁAD UMÓW?

    Get PDF
    The aim of this research project is to verify whether machine translation (MT) technology can be utilized in the process of professional translation. The genre to be tested in this study is a legal contract. It is a non-literary text, with a high rate of repeatable phrases, predictable lexis, culture-bound terms and syntactically complex sentences (Šarčević 2000, Berezowski 2008). The subject of this study is MT software available on the market that supports the English-Polish language pair: Google MT and Microsoft MT. During the experiment, the process of post-editing of MT raw output was recorded and then analysed in order to retrieve the following data:(i)                  number of errors in MT raw output,(ii)                 types of errors (syntactic, grammatical, lexical) and their frequency,(iii)               degree of fidelity to the original text (frequency of meaning omissions and meaning distortions), (iv)               time devoted to the editing process of the MT raw output.The research results should help translators make an informed decision whether they would like to invite MT into their work environment.Niniejszy projekt badawczy ma na celu wykazanie czy jakość tłumaczenia maszynowego jest na tyle dobra, by mogło być ono wykorzystywane podczas pracy profesjonalnego tłumacza prawniczego. Podczas badania analizie poddane zostały umowy – teksty użytkowe charakteryzujące się wysoką powtarzalnością wyrażeń, zwrotów i terminów, złożoną składnią oraz nieprzystawalnością terminologiczną (Šarčević 2000, Berezowski 2008). Przyjęta metoda badawcza polegała na nagraniu procesu tłumaczenia przy zastosowaniu narzędzi Google MT oraz Microsoft MT. Badanie umożliwiło wydobycie informacji na temat użyteczności tłumaczenia maszynowego poprzez określenie: (i)                  rodzaju błędów występujących w tekście wygenerowanym przez tłumacza maszynowego,(ii)                 częstotliwości występowania błędów,(iii)               zgodności merytorycznej z treścią oryginału (liczba pominięć oraz zniekształceń),(iv)               czasu poświęconego na edycję tekstu wygenerowanego przez tłumacza maszynowego.Wyniki badania powinny pomóc tłumaczom w podjęciu świadomej decyzji czy chcieliby włączyć tłumaczenie maszynowe do swojego warsztatu pracy

    Machine translation : can it assist in professional translation of contracts?

    Get PDF
    The aim of this research project is to verify whether machine translation (MT) technology can be utilized in the process of professional translation. The genre to be tested in this study is a legal contract. It is a non-literary text, with a high rate of repeatable phrases, predictable lexis, culture-bound terms and syntactically complex sentences (Šar evi 2000, Berezowski 2008). The subject of this study is MT software available on the market that supports the English-Polish language pair: Google MT and Microsoft MT. During the experiment, the process of post-editing of MT raw output was recorded and then analysed in order to retrieve the following data: (i) number of errors in MT raw output, (ii) types of errors (syntactic, grammatical, lexical) and their frequency, (iii) degree of fidelity to the original text (frequency of meaning omissions and meaning distortions), (iv) time devoted to the editing process of the MT raw output. The research results should help translators make an informed decision whether they would like to invite MT into their work environment

    GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation

    Full text link
    Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited to tasks that can be reliably evaluated in an automatic manner. This work introduces GENIE, an extensible human evaluation leaderboard, which brings the ease of leaderboards to text generation tasks. GENIE automatically posts leaderboard submissions to crowdsourcing platforms asking human annotators to evaluate them on various axes (e.g., correctness, conciseness, fluency) and compares their answers to various automatic metrics. We introduce several datasets in English to GENIE, representing four core challenges in text generation: machine translation, summarization, commonsense reasoning, and machine comprehension. We provide formal granular evaluation metrics and identify areas for future research. We make GENIE publicly available and hope that it will spur progress in language generation models as well as their automatic and manual evaluation
    corecore