Search CORE

40 research outputs found

A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation

Author: Bisazza A.
Monz C.
van der Wees M.
Publication venue: The COLING 2016 Organizing Committee
Publication date: 01/01/2016
Field of study

International Migration, Integration and Social Cohesion online publications

The Effect of Arabism of Romanic Alphabets on the Development of 9th Grade English as a Foreign Language Students' Writing Skills at Secondary School Level

Author: Zuhair Ahmad
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 02/01/2016
Field of study

This paper aims at investigating the effect of Arabization of Romanic Alphabets on the development of 9th Grade English as a Foreign Language students' composition writing skills at secondary school level. This experimental study includes 25 secondary school students in their 9th Grade in which English is taught as a foreign language at Al-Husainieh Secondary School for boys. The finding of this study indicates that students usually tend to write and compose English language sentences through Romanizing Arabic letters. This may be related to different reasons such as their weakness in writing and lack of awareness about specific aspect of sentence structures and lack of vocabulary deposit, but even good students tend to use the romanic alphabets in writing. This study recommends that students should be familiar with the meaning of English words. Key Words: Arabization, Romanic, Alphabets, Writing

International Institute for Science, Technology and Education (IISTE): E-Journals

Atar: Attention-based LSTM for Arabizi transliteration

Author: Abuammar Analle
Al-Ayyoub Mahmoud
Talafha Bashar
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/06/2021
Field of study

A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present Atar, an attention-based encoder-decoder model for Arabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49)

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Multi-Task sequence prediction for Tunisian Arabizi multi-level annotation

Author: elisa gugliotta
marco dinarelli
olivier kraif
Publication venue
Publication date: 01/01/2020
Field of study

In this paper we propose a multi-task sequence prediction system, based on recurrent neural networks and used to annotate on multiple levels an Arabizi Tunisian corpus. The annotation performed are text classification, tokenization, PoS tagging and encoding of Tunisian Arabizi into CODA* Arabic orthography. The system is learned to predict all the annotation levels in cascade, starting from Arabizi input. We evaluate the system on the TIGER German corpus, suitably converting data to have a multi-task problem, in order to show the effectiveness of our neural architecture. We show also how we used the system in order to annotate a Tunisian Arabizi corpus, which has been afterwards manually corrected and used to further evaluate sequence models on Tunisian data. Our system is developed for the Fairseq framework, which allows for a fast and easy use for any other sequence prediction problem

Hal - Université Grenoble Alpes

Archivio della ricerca- Università di Roma La Sapienza

La fréquence de l’alternance codique dans les groupes WhatsApp des étudiants libanais

Author: Halawi Ayman
Hardane Jarjoura
Messarra Nasri
Publication venue: Canadian Association of Applied Linguistics / Association canadienne de linguistique appliquée
Publication date: 01/01/2022
Field of study

The means of computer-mediated communication (CMC) and specifically the WhatsApp application, have led to innovative language practices in written communication. Among these practices is the high frequency of Code-Switching (CS), which is defined in this study as a switch from one written code to another within the same message. This quantitative study aims to automatically identify occurrences of Code-Switching in WhatsApp group chats. All through 14 months, we collected 168 219 messages from 30 WhatsApp groups. The study sample encompasses 1 482 bilingual students from 7 Lebanese universities. A computer tool "DACA" (automatic detection of Code-Switching and arabizi) has been developed to detect the frequency of this phenomenon resulting from languages contact. The results show that in the corpus, there are 15 342 occurrences of CS or 9,1% of the total number of messages. 70,5% of these CS occurrences are detected in messages in Arabizi, 17,9% in messages in English, 10,6% in messages in Arabic and 1% in messages in French. The results also reveal that CS in messages composed in Arabizi are quite often towards English (91,3% of the total number of these CS occurrences) and towards Arabizi in messages composed in English with the same percentage.Les moyens de communication médiée par ordinateur (CMO) et spécifiquement l’application WhatsApp, ont mené à des pratiques langagières innovantes au niveau de la communication écrite. Parmi ces pratiques, le recours à l’alternance codique (AC), qui est définie dans cette étude, comme un passage d’un code écrit à un autre au sein du même message. Cette étude quantitative visait à identifier automatiquement les occurrences de l’alternance codique dans les discussions de groupes WhatsApp durant 14 mois. Nous avons collecté 168 219 messages dans 30 groupes WhatsApp. L’échantillon de l’étude comprend 1 482 étudiants bilingues issus de 7 établissements universitaires libanais. Un outil informatique ‘DACA’ (détection automatique de l’alternance codique et l’arabizi) a été développé pour détecter la fréquence de ce phénomène résultant du contact des langues. Les résultats montrent que dans le corpus il y a 15 342 occurrences de l’AC soit 9,1% du total des messages. 70,5% de ces ACs sont détectés dans les messages en arabizi et 17,9% dans les messages en anglais, 10,6% dans les messages en arabe et 1% dans les messages en français. Les résultats ont montré aussi que les ACs dans les messages composés en arabizi sont assez souvent vers l’anglais (91,3% du total de ces ACs) et vers l’arabizi dans les messages composés en anglais avec le même pourcentage

University of New Brunswick: Centre for Digital Scholarship Journals

Érudit

SenZi: A Sentiment Analysis Lexicon for the Latinised Arabic (Arabizi)

Author: Alani Harith
Fernandez Miriam
Glavas Goran
Hajj Hazem
Sharafeddine Sanaa
Tobaili Taha
Publication venue
Publication date: 01/01/2019
Field of study

Arabizi is an informal written form of dialectal Arabic transcribed in Latin alphanumeric characters. It has a proven popularity on chat platforms and social media, yet it suffers from a severe lack of natural language processing (NLP) resources. As such, texts written in Arabizi are often disregarded in sentiment analysis tasks for Arabic. In this paper we describe the creation of a sentiment lexicon for Arabizi that was enriched with word embeddings. The result is a new Arabizi lexicon consisting of 11.3K positive and 13.3K negative words. We evaluated this lexicon by classifying the sentiment of Arabizi tweets achieving an F1-score of 0.72. We provide a detailed error analysis to present the challenges that impact the sentiment analysis of Arabizi

Crossref

Open Research Online (The Open University)

MAnnheim DOCument Server

A review of sentiment analysis research in Arabic language

Author: Cambria Erik
HajHmida Moez Ben
Oueslati Oumaima
Ounelli Habib
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Recommended from our members

Writing Arabizi: Orthographic Variation In Romanized Lebanese Arabicon Twitter

Author: Sullivan Natalie
Publication venue
Publication date: 01/01/2017
Field of study

How does technology influence the script in which a language is written? Over the past few decades, a new form of writing has emerged across the Arab world. Known as Arabizi, it is a type of Romanized Arabic that uses Latin characters instead of Arabic script. It is mainly used by youth in technology-related contexts such as social media and texting, and has made many older Arabic speakers fear that more standard forms of Arabic may be in danger because of its use. Prior work on Arabizi suggests that although it is used frequently on social media, its orthography is not yet standardized (Palfreyman and Khalil, 2003; Abdel-Ghaffar et al., 2011). Therefore, this thesis aimed to examine orthographic variation in Romanized Lebanese Arabic, which has rarely beenstudied as a Romanized dialect. It was interested in how often Arabizi is used on Twitter in Lebanon and the extent of its orthographic variation. Using Twitter data collected from Beirut, tweets were analyzed to discover the most common orthographic variants in Arabizi for each Arabic letter, as well as the overall rate of Arabizi use. Results show that Arabizi was not used as frequently as hypothesized on Twitter, probably because of its low prestige and increased globalization. However, its consonants are relatively standardized, while its vowels show more variation. This thesis adds to the existing conversation about Romanized Arabic by presenting a detailed study of orthographic variation in Lebanese Arabic. The results could have useful implications for Arabic language ideology and technological endeavors, such as natural language processing or translation programs.

Texas ScholarWorks