87 research outputs found

    Code-choice on Twitter: How stance-taking and linguistic accommodation reflect the identity of polyglossic Egyptian users

    Get PDF
    This study examines the online identity of polyglossic Egyptian users of Twitter. It is descriptive and exploratory utilizing a qualitative design with some frequency count which adds descriptive data. Data were collected using a Discourse Completion Task (DCT) where the participants were presented with a number of tweets and were asked to type another tweet in response to each. The findings from the study suggest that polyglossic Egyptians, those who are proficient in English as well as Arabic, exhibited an assertive identity on Twitter. This identity was constructed through the choice of code, the linguistic accommodation to the tweet authors, and the stance they took. Polyglossic Egyptians were found to use English more than any other code, followed by Arabizi, and then Arabic. They linguistically accommodated the tweet authors in their replies to some extent by choosing the same code in replying as that used in the original tweet. Further, and using Du Bois’ (2007) stance triangle framework, it was also found that they expressed their (dis)alignment quite bluntly by taking an epistemic stance achieved through the use of boosters (very few hedges were used), sarcasm, simple present tense (to express an opinion as if stating a fact), and modals (to offer advice). By doing that, polyglossic Egyptians were found to be assertive in expressing their opinions, often showing themselves as informative, superior people who are guided by facts about topics rather than feelings

    A review of sentiment analysis research in Arabic language

    Full text link
    Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

    SenZi: A Sentiment Analysis Lexicon for the Latinised Arabic (Arabizi)

    Get PDF
    Arabizi is an informal written form of dialectal Arabic transcribed in Latin alphanumeric characters. It has a proven popularity on chat platforms and social media, yet it suffers from a severe lack of natural language processing (NLP) resources. As such, texts written in Arabizi are often disregarded in sentiment analysis tasks for Arabic. In this paper we describe the creation of a sentiment lexicon for Arabizi that was enriched with word embeddings. The result is a new Arabizi lexicon consisting of 11.3K positive and 13.3K negative words. We evaluated this lexicon by classifying the sentiment of Arabizi tweets achieving an F1-score of 0.72. We provide a detailed error analysis to present the challenges that impact the sentiment analysis of Arabizi

    An Experimental Study on Sentiment Classification of Moroccan dialect texts in the web

    Full text link
    With the rapid growth of the use of social media websites, obtaining the users' feedback automatically became a crucial task to evaluate their tendencies and behaviors online. Despite this great availability of information, and the increasing number of Arabic users only few research has managed to treat Arabic dialects. The purpose of this paper is to study the opinion and emotion expressed in real Moroccan texts precisely in the YouTube comments using some well-known and commonly used methods for sentiment analysis. In this paper, we present our work of Moroccan dialect comments classification using Machine Learning (ML) models and based on our collected and manually annotated YouTube Moroccan dialect dataset. By employing many text preprocessing and data representation techniques we aim to compare our classification results utilizing the most commonly used supervised classifiers: k-nearest neighbors (KNN), Support Vector Machine (SVM), Naive Bayes (NB), and deep learning (DL) classifiers such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LTSM). Experiments were performed using both raw and preprocessed data to show the importance of the preprocessing. In fact, the experimental results prove that DL models have a better performance for Moroccan Dialect than classical approaches and we achieved an accuracy of 90%.Comment: 13 pages, 5 tables, 2 figure

    Atar: Attention-based LSTM for Arabizi transliteration

    Get PDF
    A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present Atar, an attention-based encoder-decoder model for Arabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49)

    SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis

    Full text link
    Data annotation is an important but time-consuming and costly procedure. To sort a text into two classes, the very first thing we need is a good annotation guideline, establishing what is required to qualify for each class. In the literature, the difficulties associated with an appropriate data annotation has been underestimated. In this paper, we present a novel approach to automatically construct an annotated sentiment corpus for Algerian dialect (a Maghrebi Arabic dialect). The construction of this corpus is based on an Algerian sentiment lexicon that is also constructed automatically. The presented work deals with the two widely used scripts on Arabic social media: Arabic and Arabizi. The proposed approach automatically constructs a sentiment corpus containing 8000 messages (where 4000 are dedicated to Arabic and 4000 to Arabizi). The achieved F1-score is up to 72% and 78% for an Arabic and Arabizi test sets, respectively. Ongoing work is aimed at integrating transliteration process for Arabizi messages to further improve the obtained results.Comment: To appear in the 9th International Conference on Brain Inspired Cognitive Systems (BICS 2018
    corecore