Search CORE

17 research outputs found

IDAT@FIRE2019: Overview of the Track on Irony Detection in Arabic Tweets

Author: Benamara Farah
Ghanem Bilal
Karoui Jihen
Moriceau Véronique
Rosso Paolo
Publication venue: CEUR-WS.org
Publication date: 15/12/2019
Field of study

[EN] This overview paper describes the first shared task on irony detection for the Arabic language. The task consists of a binary classification of tweets as ironic or not using a dataset composed of 5,030 Arabic tweets about different political issues and events related to the Middle East and the Maghreb. Tweets in our dataset are written in Modern Standard Arabic but also in different Arabic language varieties including Egypt, Gulf, Levantine and Maghrebi dialects. Eighteen teams registered to the task among which ten submitted their runs. The methods of participants ranged from feature-based to neural networks using either classical machine learning techniques or ensemble methods. The best performing system achieved F-score value of 0.844, showing that classical feature-based models outperform the neural ones.This publication was made possible by NPRP grant 9-175-1-033 from the Qatar National Research Fund (a member of Qatar Foundation). The findings achieved herein are solely the responsibility of the last author. The work of Paolo Rosso was also partially funded by Generalitat Valenciana under grant PROMETEO/2019/121.Ghanem, B.; Karoui, J.; Benamara, F.; Moriceau, V.; Rosso, P. (2019). IDAT@FIRE2019: Overview of the Track on Irony Detection in Arabic Tweets. CEUR-WS.org. 380-390. http://hdl.handle.net/10251/180744S38039

RiuNet

RGCL at IDAT: deep learning models for irony detection in Arabic language

Author: Mandhari Salim
Mitkov Ruslan
Mohamed Emad
Orasan Constantin
Plum Alistair
Ranasinghe Tharindu
Saadany Hadeel
Publication venue: 'Journal of Cellular Neuroscience and Oxidative Stress'
Publication date: 01/01/2019
Field of study

This article describes the system submitted by the RGCL team to the IDAT 2019 Shared Task: Irony Detection in Arabic Tweets. The system detects irony in Arabic tweets using deep learning. The paper evaluates the performance of several deep learning models, as well as how text cleaning and text pre-processing influence the accuracy of the system. Several runs were submitted. The highest F1 score achieved for one of the submissions was 0.818 making the team RGCL rank 4th out of 10 teams in final results. Overall, we present a system that uses minimal pre-processing but capable of achieving competitive results

Lancaster E-Prints

Open Repository and Bibliography - Luxembourg

Wolverhampton Intellectual Repository and E-theses

Marking Irony Activators in a Universal Dependencies Treebank: The Case of an Italian Twitter Corpus

Author: Bosco Cristina
Cignarella Alessandra Teresa
Paolo Rosso
Sanguinetti Manuela
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2020
Field of study

Institutional Research Information System University of Turin

A review of sentiment analysis research in Arabic language

Author: Cambria Erik
HajHmida Moez Ben
Oueslati Oumaima
Ounelli Habib
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP

Author: Abdul-Mageed Muhammad
Khondaker Md Tawkat Islam
Nagoudi El Moatez Billah
Waheed Abdul
Publication venue
Publication date: 24/05/2023
Field of study

The recent emergence of ChatGPT has brought a revolutionary change in the landscape of NLP. Although ChatGPT has consistently shown impressive performance on English benchmarks, its exact capabilities on most other languages remain largely unknown. To better understand ChatGPT's capabilities on Arabic, we present a large-scale evaluation of the model on a broad range of Arabic NLP tasks. Namely, we evaluate ChatGPT on 32 diverse natural language understanding and generation tasks on over 60 different datasets. To the best of our knowledge, our work offers the first performance analysis of ChatGPT on Arabic NLP at such a massive scale. Our results show that, despite its success on English benchmarks, ChatGPT trained in-context (few-shot) is consistently outperformed by much smaller dedicated models finetuned on Arabic. These results suggest that there is significant place for improvement for instruction-tuned LLMs such as ChatGPT.Comment: Work in progres

arXiv.org e-Print Archive

Overview of the CLEF 2022 JOKER Task 3: Pun Translation from English into French

Author: Albin Digue
Anne-Gwenn Bosser
Aurianne Damoy
Benoît Jeanjean
Claudine Borg
Fabio Regattin
Gaëlle Le Corre
Julien Boccou
Liana Ermakova
Radia Hannachi
Silvia Araújo
Tristan Miller
Élise Mathurin
Publication venue
Publication date: 01/01/2022
Field of study

The translation of the pun is one of the most challenging issues for translators and for this reason has become an intensively studied phenomenon in the field of translation studies. Translation technology aims to partially or even totally automate the translation process, but relatively little attention has been paid to the use of computers for the translation of wordplay. The CLEF 2022 JOKER track aims to build a multilingual corpus of wordplay and evaluation metrics in order to advance the automation of creative-language translation. This paper provides an overview of the track’s Pilot Task 3, where the goal is to translate entire phrases containing wordplay (particularly puns). We describe the data collection, the task setup, the evaluation procedure, and the participants’ results. We also cover a side product of our project, a homogeneous monolingual corpus for wordplay detection in French

Archivio istituzionale della ricerca - Università degli Studi di Udine

ORCA: A Challenging Benchmark for Arabic Language Understanding

Author: Abdul-Mageed Muhammad
Elmadany AbdelRahim
Nagoudi El Moatez Billah
Publication venue
Publication date: 29/05/2023
Field of study

Due to their crucial role in all NLP, several benchmarks have been proposed to evaluate pretrained language models. In spite of these efforts, no public benchmark of diverse nature currently exists for evaluation of Arabic. This makes it challenging to measure progress for both Arabic and multilingual language models. This challenge is compounded by the fact that any benchmark targeting Arabic needs to take into account the fact that Arabic is not a single language but rather a collection of languages and varieties. In this work, we introduce ORCA, a publicly available benchmark for Arabic language understanding evaluation. ORCA is carefully constructed to cover diverse Arabic varieties and a wide range of challenging Arabic understanding tasks exploiting 60 different datasets across seven NLU task clusters. To measure current progress in Arabic NLU, we use ORCA to offer a comprehensive comparison between 18 multilingual and Arabic language models. We also provide a public leaderboard with a unified single-number evaluation metric (ORCA score) to facilitate future research.Comment: All authors contributed equally. Accepted at ACL 2023, Toronto, Canad

arXiv.org e-Print Archive