1 research outputs found
Automatic Arabic Dialect Identification Systems for Written Texts: A Survey
Arabic dialect identification is a specific task of natural language
processing, aiming to automatically predict the Arabic dialect of a given text.
Arabic dialect identification is the first step in various natural language
processing applications such as machine translation, multilingual
text-to-speech synthesis, and cross-language text generation. Therefore, in the
last decade, interest has increased in addressing the problem of Arabic dialect
identification. In this paper, we present a comprehensive survey of Arabic
dialect identification research in written texts. We first define the problem
and its challenges. Then, the survey extensively discusses in a critical manner
many aspects related to Arabic dialect identification task. So, we review the
traditional machine learning methods, deep learning architectures, and complex
learning approaches to Arabic dialect identification. We also detail the
features and techniques for feature representations used to train the proposed
systems. Moreover, we illustrate the taxonomy of Arabic dialects studied in the
literature, the various levels of text processing at which Arabic dialect
identification are conducted (e.g., token, sentence, and document level), as
well as the available annotated resources, including evaluation benchmark
corpora. Open challenges and issues are discussed at the end of the survey