85 research outputs found

    Maghrebi Arabic dialect processing: an overview

    Get PDF
    International audienceNatural Language Processing for Arabic dialects has grown widely these last years. Indeed, several works were proposed dealing with all aspects of Natural Language Processing. However , some AD varieties have received more attention and have a growing collection of resources. Others varieties, such as Maghrebi, still lag behind in that respect. Maghrebi Arabic is the family of Arabic dialects spoken in the Maghreb region (principally Algeria, Tunisia and Morocco). In this work we are interested in these three languages. This paper presents a review of natural language processing for Maghrebi Arabic dialects

    A review of sentiment analysis research in Arabic language

    Full text link
    Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

    Atar: Attention-based LSTM for Arabizi transliteration

    Get PDF
    A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present Atar, an attention-based encoder-decoder model for Arabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49)

    Creating Parallel Arabic Dialect Corpus: Pitfalls to Avoid

    Get PDF
    International audienceCreating parallel corpora is a difficult issue that many researches try to deal with. In the context of under-resourced languages like Arabic dialects this issue is more complicated due to the nature of these spoken languages. In this paper, we share our experiment of creating a Parallel Corpus which contain several dialects and Modern Standard Arabic(MSA). We attempt to highlight the most important choices that we did and how good were these choices
    corecore