research articlejournal article

Normalization of Arabic Dialects into Modern Standard Arabic using BERT and GPT-2

Abstract

We present an encoder-decored based model for normalization of Arabic dialects using both BERT and GPT-2 based models. Arabic is a language of many dialects that not only differ from the Modern Standard Arabic (MSA) in terms of pronunciation but also in terms of morphology, grammar and lexical choice. This diversity can be troublesome even to a native Arabic speaker let alone a computer. Several NLP tools work well for MSA and in some of the main dialects but fail to cover Arabic language as a whole. Based on our manual evaluation, our model normalizes sentences entirely correctly 46\% of the time and almost correctly 26\% of the time

Similar works

Full text

thumbnail-image

Episciences.org

redirect
Last time updated on 31/05/2024

This paper was published in Episciences.org.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: info:eu-repo/semantics/openAccess