Multilingual simultaneous sentence end and punctuation prediction

Batista, F.; Coheur, L.; Guerreiro, N. M.; Rei, R.

Multilingual simultaneous sentence end and punctuation prediction

Authors: F. Batista
L. Coheur
N. M. Guerreiro
R. Rei
Publication date: 1 January 2021
Publisher: CEUR-WS

Abstract

This paper describes the model and its corresponding setup, proposed by the Unbabel & INESC-ID team for the 1st Shared Task on Sentence End and Punctuation Prediction in NLG Text (SEPP-NLG 2021). The shared task covers 4 languages (English, German, French and Italian) and includes two subtasks: Subtask 1 - detecting the end of a sentence, and subtask 2 - predicting a range of punctuation marks. Our team proposes a single multilingual and multitask model that is able to produce suitable results for all the languages and subtasks involved. The results show that it is possible to achieve state-of-the-art results using one single multilingual model for both tasks and multiple languages. Using a single multilingual model to solve the task for multiple languages is of particular importance, since training a different model for each language is a cumbersome and time-consuming process.info:eu-repo/semantics/publishedVersio

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Repositório Institucional do ISCTE-IUL

oai:repositorio.iscte-iul.pt:1...

Last time updated on 05/11/2022