Search CORE

87 research outputs found

Recommended from our members

Sentiment Analysis for the Low-Resourced Latinised Arabic "Arabizi"

Author: Tobaili Taha
Publication venue
Publication date: 02/11/2020
Field of study

The expansion of digital communication mediums from private mobile messaging into the public through social media presented an opportunity for the data science research and industry to mine the generated big data for artificial information extraction. A popular information extraction task is sentiment analysis, which aims at extracting polarity opinions, positive, negative, or neutral, from the written natural language. This science helped organisations better understand the public’s opinion towards events, news, public figures, and products. However, sentiment analysis has advanced for the English language ahead of Arabic. While sentiment analysis for Arabic is developing in the literature of Natural Language Processing (NLP), a popular variety of Arabic, Arabizi, has been overlooked for sentiment analysis advancements. Arabizi is an informal transcription of the spoken dialectal Arabic in Latin script used for social texting. It is known to be common among the Arab youth, yet it is overlooked in efforts on Arabic sentiment analysis for its linguistic complexities. As to Arabic, Arabizi is rich in inflectional morphology, but also codeswitched with English or French, and distinctively transcribed without adhering to a standard orthography. The rich morphology, inconsistent orthography, and codeswitching challenges are compounded together to have a multiplied effect on the lexical sparsity of the language, where each Arabizi word becomes eligible to be spelled in many ways, that, in addition to the mixing of other languages within the same textual context. The resulting high degree of lexical sparsity defies the very basics of sentiment analysis, classification of positive and negative words. Arabizi is even faced with a severe shortage of data resources that are required to set out any sentiment analysis approach. In this thesis, we tackle this gap by conducting research on sentiment analysis for Arabizi. We addressed the sparsity challenge by harvesting Arabizi data from multi-lingual social media text using deep learning to build Arabizi resources for sentiment analysis. We developed six new morphologically and orthographically rich Arabizi sentiment lexicons and set the baseline for Arabizi sentiment analysis on social media

Open Research Online (The Open University)

Code-choice on Twitter: How stance-taking and linguistic accommodation reflect the identity of polyglossic Egyptian users

Author: Mashhour Sahar
Publication venue: AUC Knowledge Fountain
Publication date: 01/06/2016
Field of study

This study examines the online identity of polyglossic Egyptian users of Twitter. It is descriptive and exploratory utilizing a qualitative design with some frequency count which adds descriptive data. Data were collected using a Discourse Completion Task (DCT) where the participants were presented with a number of tweets and were asked to type another tweet in response to each. The findings from the study suggest that polyglossic Egyptians, those who are proficient in English as well as Arabic, exhibited an assertive identity on Twitter. This identity was constructed through the choice of code, the linguistic accommodation to the tweet authors, and the stance they took. Polyglossic Egyptians were found to use English more than any other code, followed by Arabizi, and then Arabic. They linguistically accommodated the tweet authors in their replies to some extent by choosing the same code in replying as that used in the original tweet. Further, and using Du Boisâ€™ (2007) stance triangle framework, it was also found that they expressed their (dis)alignment quite bluntly by taking an epistemic stance achieved through the use of boosters (very few hedges were used), sarcasm, simple present tense (to express an opinion as if stating a fact), and modals (to offer advice). By doing that, polyglossic Egyptians were found to be assertive in expressing their opinions, often showing themselves as informative, superior people who are guided by facts about topics rather than feelings

AUC Knowledge Fountain (American Univ. in Cairo)

A review of sentiment analysis research in Arabic language

Author: Cambria Erik
HajHmida Moez Ben
Oueslati Oumaima
Ounelli Habib
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Recommended from our members

Writing Arabizi: Orthographic Variation In Romanized Lebanese Arabicon Twitter

Author: Sullivan Natalie
Publication venue
Publication date: 01/01/2017
Field of study

How does technology influence the script in which a language is written? Over the past few decades, a new form of writing has emerged across the Arab world. Known as Arabizi, it is a type of Romanized Arabic that uses Latin characters instead of Arabic script. It is mainly used by youth in technology-related contexts such as social media and texting, and has made many older Arabic speakers fear that more standard forms of Arabic may be in danger because of its use. Prior work on Arabizi suggests that although it is used frequently on social media, its orthography is not yet standardized (Palfreyman and Khalil, 2003; Abdel-Ghaffar et al., 2011). Therefore, this thesis aimed to examine orthographic variation in Romanized Lebanese Arabic, which has rarely beenstudied as a Romanized dialect. It was interested in how often Arabizi is used on Twitter in Lebanon and the extent of its orthographic variation. Using Twitter data collected from Beirut, tweets were analyzed to discover the most common orthographic variants in Arabizi for each Arabic letter, as well as the overall rate of Arabizi use. Results show that Arabizi was not used as frequently as hypothesized on Twitter, probably because of its low prestige and increased globalization. However, its consonants are relatively standardized, while its vowels show more variation. This thesis adds to the existing conversation about Romanized Arabic by presenting a detailed study of orthographic variation in Lebanese Arabic. The results could have useful implications for Arabic language ideology and technological endeavors, such as natural language processing or translation programs.

Texas ScholarWorks

SenZi: A Sentiment Analysis Lexicon for the Latinised Arabic (Arabizi)

Author: Alani Harith
Fernandez Miriam
Glavas Goran
Hajj Hazem
Sharafeddine Sanaa
Tobaili Taha
Publication venue
Publication date: 01/01/2019
Field of study

Arabizi is an informal written form of dialectal Arabic transcribed in Latin alphanumeric characters. It has a proven popularity on chat platforms and social media, yet it suffers from a severe lack of natural language processing (NLP) resources. As such, texts written in Arabizi are often disregarded in sentiment analysis tasks for Arabic. In this paper we describe the creation of a sentiment lexicon for Arabizi that was enriched with word embeddings. The result is a new Arabizi lexicon consisting of 11.3K positive and 13.3K negative words. We evaluated this lexicon by classifying the sentiment of Arabizi tweets achieving an F1-score of 0.72. We provide a detailed error analysis to present the challenges that impact the sentiment analysis of Arabizi

Crossref

Open Research Online (The Open University)

MAnnheim DOCument Server

An Experimental Study on Sentiment Classification of Moroccan dialect texts in the web

Author: Hafidi Imad
Jbel Mouad
Metrane Abdulmutallib
Publication venue
Publication date: 28/03/2023
Field of study

With the rapid growth of the use of social media websites, obtaining the users' feedback automatically became a crucial task to evaluate their tendencies and behaviors online. Despite this great availability of information, and the increasing number of Arabic users only few research has managed to treat Arabic dialects. The purpose of this paper is to study the opinion and emotion expressed in real Moroccan texts precisely in the YouTube comments using some well-known and commonly used methods for sentiment analysis. In this paper, we present our work of Moroccan dialect comments classification using Machine Learning (ML) models and based on our collected and manually annotated YouTube Moroccan dialect dataset. By employing many text preprocessing and data representation techniques we aim to compare our classification results utilizing the most commonly used supervised classifiers: k-nearest neighbors (KNN), Support Vector Machine (SVM), Naive Bayes (NB), and deep learning (DL) classifiers such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LTSM). Experiments were performed using both raw and preprocessed data to show the importance of the preprocessing. In fact, the experimental results prove that DL models have a better performance for Moroccan Dialect than classical approaches and we achieved an accuracy of 90%.Comment: 13 pages, 5 tables, 2 figure

arXiv.org e-Print Archive

Atar: Attention-based LSTM for Arabizi transliteration

Author: Abuammar Analle
Al-Ayyoub Mahmoud
Talafha Bashar
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/06/2021
Field of study

A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present Atar, an attention-based encoder-decoder model for Arabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49)

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis

Author: AZ Khan
JH AlKhateeb
JH AlKhateeb
JH AlKhateeb
M Al-Ayyoub
M Mataoui
M Rushdi-Saleh
M Taboada
N Al-Twairesh
SM Mohammad
Publication venue
Publication date: 15/08/2018
Field of study

Data annotation is an important but time-consuming and costly procedure. To sort a text into two classes, the very first thing we need is a good annotation guideline, establishing what is required to qualify for each class. In the literature, the difficulties associated with an appropriate data annotation has been underestimated. In this paper, we present a novel approach to automatically construct an annotated sentiment corpus for Algerian dialect (a Maghrebi Arabic dialect). The construction of this corpus is based on an Algerian sentiment lexicon that is also constructed automatically. The presented work deals with the two widely used scripts on Arabic social media: Arabic and Arabizi. The proposed approach automatically constructs a sentiment corpus containing 8000 messages (where 4000 are dedicated to Arabic and 4000 to Arabizi). The achieved F1-score is up to 72% and 78% for an Arabic and Arabizi test sets, respectively. Ongoing work is aimed at integrating transliteration process for Arabizi messages to further improve the obtained results.Comment: To appear in the 9th International Conference on Brain Inspired Cognitive Systems (BICS 2018

arXiv.org e-Print Archive

Crossref