40 research outputs found

    The Effect of Arabism of Romanic Alphabets on the Development of 9th Grade English as a Foreign Language Students' Writing Skills at Secondary School Level

    Get PDF
    This paper aims at investigating the effect of Arabization of Romanic Alphabets on the development ofĀ  9th GradeĀ  English as a Foreign Language students' composition writing skills at secondary school level. This experimental study includes 25 secondary school students in their 9th Grade in whichĀ  English is taught as a foreign language at Al-Husainieh Secondary School for boys. The finding of this study indicates that students usually tend to write and compose English language sentences through Romanizing Arabic letters. This may be related to differentĀ  reasons such as their weakness in writing and lack of awareness about specific aspect of sentence structures and lack of vocabulary deposit, but even good students tend to use the romanic alphabets in writing. This study recommends that students should be familiar with the meaning of English words. Key Words: Arabization, Romanic, Alphabets, Writing

    Atar: Attention-based LSTM for Arabizi transliteration

    Get PDF
    A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale ā€œArabizi to Arabic scriptā€ parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present Atar, an attention-based encoder-decoder model for Arabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49)

    Multi-Task sequence prediction for Tunisian Arabizi multi-level annotation

    Get PDF
    In this paper we propose a multi-task sequence prediction system, based on recurrent neural networks and used to annotate on multiple levels an Arabizi Tunisian corpus. The annotation performed are text classification, tokenization, PoS tagging and encoding of Tunisian Arabizi into CODA* Arabic orthography. The system is learned to predict all the annotation levels in cascade, starting from Arabizi input. We evaluate the system on the TIGER German corpus, suitably converting data to have a multi-task problem, in order to show the effectiveness of our neural architecture. We show also how we used the system in order to annotate a Tunisian Arabizi corpus, which has been afterwards manually corrected and used to further evaluate sequence models on Tunisian data. Our system is developed for the Fairseq framework, which allows for a fast and easy use for any other sequence prediction problem

    La frĆ©quence de lā€™alternance codique dans les groupes WhatsApp des Ć©tudiants libanais

    Get PDF
    The means of computer-mediated communication (CMC) and specifically the WhatsApp application, have led to innovative language practices in written communication. Among these practices is the high frequency of Code-Switching (CS), which is defined in this study as a switch from one written code to another within the same message. This quantitative study aims to automatically identify occurrences of Code-Switching in WhatsApp group chats. All through 14 months, we collected 168 219 messages from 30 WhatsApp groups. The study sample encompasses 1 482 bilingual students from 7 Lebanese universities. A computer tool "DACA" (automatic detection of Code-Switching and arabizi) has been developed to detect the frequency of this phenomenon resulting from languages contact. The results show that in the corpus, there are 15 342 occurrences of CS or 9,1% of the total number of messages. 70,5% of these CS occurrences are detected in messages in Arabizi, 17,9% in messages in English, 10,6% in messages in Arabic and 1% in messages in French. The results also reveal that CS in messages composed in Arabizi are quite often towards English (91,3% of the total number of these CS occurrences) and towards Arabizi in messages composed in English with the same percentage.Les moyens de communication meĢdieĢe par ordinateur (CMO) et speĢcifiquement lā€™application WhatsApp, ont meneĢ aĢ€ des pratiques langagieĢ€res innovantes au niveau de la communication eĢcrite. Parmi ces pratiques, le recours aĢ€ lā€™alternance codique (AC), qui est deĢfinie dans cette eĢtude, comme un passage dā€™un code eĢcrit aĢ€ un autre au sein du meĢ‚me message. Cette eĢtude quantitative visait aĢ€ identifier automatiquement les occurrences de lā€™alternance codique dans les discussions de groupes WhatsApp durant 14 mois. Nous avons collecteĢ 168 219 messages dans 30 groupes WhatsApp. Lā€™eĢchantillon de lā€™eĢtude comprend 1 482 eĢtudiants bilingues issus de 7 eĢtablissements universitaires libanais. Un outil informatique ā€˜DACAā€™ (deĢtection automatique de lā€™alternance codique et lā€™arabizi) a eĢteĢ deĢveloppeĢ pour deĢtecter la freĢquence de ce pheĢnomeĢ€ne reĢsultant du contact des langues. Les reĢsultats montrent que dans le corpus il y a 15 342 occurrences de lā€™AC soit 9,1% du total des messages. 70,5% de ces ACs sont deĢtecteĢs dans les messages en arabizi et 17,9% dans les messages en anglais, 10,6% dans les messages en arabe et 1% dans les messages en francĢ§ais. Les reĢsultats ont montreĢ aussi que les ACs dans les messages composeĢs en arabizi sont assez souvent vers lā€™anglais (91,3% du total de ces ACs) et vers lā€™arabizi dans les messages composeĢs en anglais avec le meĢ‚me pourcentage

    SenZi: A Sentiment Analysis Lexicon for the Latinised Arabic (Arabizi)

    Get PDF
    Arabizi is an informal written form of dialectal Arabic transcribed in Latin alphanumeric characters. It has a proven popularity on chat platforms and social media, yet it suffers from a severe lack of natural language processing (NLP) resources. As such, texts written in Arabizi are often disregarded in sentiment analysis tasks for Arabic. In this paper we describe the creation of a sentiment lexicon for Arabizi that was enriched with word embeddings. The result is a new Arabizi lexicon consisting of 11.3K positive and 13.3K negative words. We evaluated this lexicon by classifying the sentiment of Arabizi tweets achieving an F1-score of 0.72. We provide a detailed error analysis to present the challenges that impact the sentiment analysis of Arabizi

    A review of sentiment analysis research in Arabic language

    Full text link
    Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language
    corecore