3 research outputs found

    Maghrebi Arabic dialect processing: an overview

    Get PDF
    International audienceNatural Language Processing for Arabic dialects has grown widely these last years. Indeed, several works were proposed dealing with all aspects of Natural Language Processing. However , some AD varieties have received more attention and have a growing collection of resources. Others varieties, such as Maghrebi, still lag behind in that respect. Maghrebi Arabic is the family of Arabic dialects spoken in the Maghreb region (principally Algeria, Tunisia and Morocco). In this work we are interested in these three languages. This paper presents a review of natural language processing for Maghrebi Arabic dialects

    Automated sentence boundary detection in modern standard arabic transcripts using deep neural networks

    Get PDF
    ABSTRACT: The increased volumes of Arabic sources of data available on the Web has boosted the development of Natural Language Processing (NLP) tools over different tasks and applications. However, to take advantage from a vast amount of these applications, a prior segmentation task call Sentence Boundary Detection (SBD) is needed. In this paper we focus on SBD over Modern Standard Arabic (MSA) by comparing two different approaches based on Deep Neural Networks (DNN) using out-of-domain and in-domain training data with only lexical features (represented as character embedding) while conducting two scenarios based on a Convolutional Neural Network and a Recurrent Neural Network with attention mechanism architectures. While tuning a big out-of-domain dataset with a smaller in-domain dataset, improves the performance in general. Our evaluations were based on IWSLT 2017 TED talks transcripts and showed similarities and differences depending of the SBD method. MSA carries certain complications given its rich and complex morphology. However, using only lexical features for Arabic SBD is an acceptable option when the source audio signal is not available and a certain level of language independence needs to be reached

    Sentence boundary detection for transcribed Tunisian Arabic

    No full text
    International audienceWe study, in this paper, the problem of detecting the sentence boundary in tran-scribed spoken Tunisian Arabic. We compare and contrast three different methods for detecting sentence bounda-ries in transcribed speech. The first method uses a set of handmade contex-tual patterns for identifying the limit of sentences. The second method aims to classify transcriptions words into four classes according to their position in a sentence. Both methods are based only on lexical and some prosodic information such as silent and filled pauses. Finally, we develop two techniques for mixing the results of the two proposed methods. We show that sentence boundary detec-tion system can improve the accuracy of a POS tagger system developed for tag-ging transcribed Tunisian Arabic
    corecore