Search CORE

5 research outputs found

Atar: Attention-based LSTM for Arabizi transliteration

Author: Abuammar Analle
Al-Ayyoub Mahmoud
Talafha Bashar
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/06/2021
Field of study

A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present Atar, an attention-based encoder-decoder model for Arabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49)

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System

Author: Abdul-Mageed Muhammad
Elmadany AbdelRahim
Sullivan Peter
Talafha Bashar
Waheed Abdul
Publication venue
Publication date: 27/10/2023
Field of study

Arabic is a complex language with many varieties and dialects spoken by over 450 millions all around the world. Due to the linguistic diversity and variations, it is challenging to build a robust and generalized ASR system for Arabic. In this work, we address this gap by developing and demoing a system, dubbed VoxArabica, for dialect identification (DID) as well as automatic speech recognition (ASR) of Arabic. We train a wide range of models such as HuBERT (DID), Whisper, and XLS-R (ASR) in a supervised setting for Arabic DID and ASR tasks. Our DID models are trained to identify 17 different dialects in addition to MSA. We finetune our ASR models on MSA, Egyptian, Moroccan, and mixed data. Additionally, for the remaining dialects in ASR, we provide the option to choose various models such as Whisper and MMS in a zero-shot setting. We integrate these models into a single web interface with diverse features such as audio recording, file upload, model selection, and the option to raise flags for incorrect outputs. Overall, we believe VoxArabica will be useful for a wide range of audiences concerned with Arabic research. Our system is currently running at https://cdce-206-12-100-168.ngrok.io/.Comment: Accepted at ArabicNLP conference co-located with EMNLP'23. First three authors contributed equall

arXiv.org e-Print Archive

WojoodNER 2023: The First Arabic Named Entity Recognition Shared Task

Author: Abdul-Mageed Muhammad
Elmadany AbdelRahim
Hamad Nagham
Jarrar Mustafa
Khalilia Mohammed
Omar Alaa'
Talafha Bashar
Publication venue
Publication date: 24/10/2023
Field of study

We present WojoodNER-2023, the first Arabic Named Entity Recognition (NER) Shared Task. The primary focus of WojoodNER-2023 is on Arabic NER, offering novel NER datasets (i.e., Wojood) and the definition of subtasks designed to facilitate meaningful comparisons between different NER approaches. WojoodNER-2023 encompassed two Subtasks: FlatNER and NestedNER. A total of 45 unique teams registered for this shared task, with 11 of them actively participating in the test phase. Specifically, 11 teams participated in FlatNER, while

8

teams tackled NestedNER. The winning teams achieved F1 scores of 91.96 and 93.73 in FlatNER and NestedNER, respectively

arXiv.org e-Print Archive

mawdoo3 ai at madar shared task arabic fine grained dialect identification with ensemble learning

Author: Abdelrahman Mattar
Abed Alhakim Freihat
Ahmad Mustafa
Ahmad Ragab
Bashar Talafha
Haitham Seelawi
Hesham Al-Bataineh
Hussein Al-Natsheh
Mohammad Zaghloul
Mostafa Samir
Publication venue
Publication date: 01/01/2019
Field of study

Crossref

Open Access Repository