Automatic identification methods on a corpus of twenty five fine-grained Arabic dialects

Abstract

International audienceThis research deals with Arabic dialect identification, a challenging issue related to Arabic NLP. Indeed, the increasing use of Arabic dialects in a written form especially in social media generates new needs in the area of Arabic dialect processing. For discriminating between dialects in a multi-dialect context, we use different approaches based on machine learning techniques. To this end, we explored several methods. We used a classification method based on symmetric Kullback-Leibler, and we experimented classical classification methods such as Naive Bayes Classifiers and more sophisticated methods like Word2Vec and Long Short-Term Memory neural network. We tested our approaches on a large database of 25 Arabic dialects in addition to MSA

    Similar works