Search CORE

3 research outputs found

DeepL and Google Translate translating portuguese multi-word units into french: progress, decline and remaining challenges (2019-2023)

Author: Bacquelaine Françoise
Publication venue
Publication date: 01/01/2023
Field of study

The transition from statistical machine translation trained with machine learning to neural machine translation (NMT) using deep machine learning has proved successful for high-resourced languages. Researchers are exploring new avenues such as zero-shot NMT models for less-resourced languages or the use of English as a pivot language to improve NMT performance. A comparative study conducted in 2019 and 2021 on DeepL (DL) and Google Translate (GT) raw NMT output shows that the performance of GT deteriorated significantly in 2021, mainly because it seemed to use English as a pivot language between two romance languages. In 2023, the same sample of 167 instances of Portuguese multi-word units (MWU) expressing progression and proportion was translated into French by DL and GT. The output in 2019, 2021 and 2023 NMT is analyzed in terms of potential error factors in the Portuguese sample and actual error types in NMT output. The progress of DL from 2019 to 2023 is insignificant while GT exceeds its 2019 score after the 2021 decline. Stronger error factors are unusual structures, combination of potential error factors, and longer MWUs. Phraseology, calque and nonsense are the most frequent error types in this study on NMT progress, decline and remaining challenges

Repositório Aberto da Universidade do Porto

Semi-supervised never-ending learning in rhetorical relation identification

Author: Hirst Graeme
Maziero Erick Galani
Pardo Thiago Alexandre Salgueiro
Publication venue: Hissar
Publication date
Field of study

Some languages do not have enough labeled data to obtain good discourse parsing, specially in the relation identification step, and the additional use of unlabeled data is a plausible solution. A workflow is presented that uses a semi-supervised learning approach. Instead of only a pre-defined additional set of unlabeled data, texts obtained from the web are continuously added. This obtains near human perfomance (0.79) in intra sentential rhetorical relation identification. An experiment for English also shows improvement using a similar workflow.São Paulo Research Foundation (FAPESP) (grant♯2014/11632)Natural Sciences and Engineering Research Council of CanadaUniversity of Toront

Anti-pandemic restrictions, uncertainty and sentiment in seven countries

Author: Charemza Wojciech
Makarova Svetlana
Rybiński Krzysztof
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/10/2022
Field of study

We investigate how the stringency of government anti-pandemic policy measures might affect economic policy uncertainty in countries with different degrees of press freedom, various press reporting styles and writing conventions. We apply a text-based measure of uncertainty using data from over 400,000 press articles from Belarus, Kazakhstan, Poland, Russia, Ukraine, the UK and the USA published before the wide-scale vaccination programmes were introduced. The measure accounts for pandemic-related words and negative sentiment scores weight the selected articles. We then tested the dynamic panel data model where the relative changes in these measures were explained by levels and changes in the stringency measures. We have found that introducing and then maintaining unchanged for a relatively long time a constant level of anti-pandemic stringency measures reduce uncertainty. In contrast, a change in such a level has the opposite effect. This result is robust across the countries, despite their differences in political systems, press control and freedom of speech

UCL Discovery

PubMed Central