Search CORE

18 research outputs found

Arabic Phoneme Learning Challenges for Madurese Students and the Solutions

Author: Edidarmo Toto
Fudhaili Achmad
Maisaroh Siti
Publication venue: Universitas Islam Negeri Raden Intan Lampung
Publication date: 16/11/2023
Field of study

This article discussed the challenges in pronouncing Arabic phonemes by students at INSTIKA Madura. Phoneme pronunciation is the most important principle in Arabic. Without the correct phoneme pronunciation, a language cannot be understood. The problem of phoneme pronunciation was investigated and a solution was found based on factor analysis. Qualitative descriptive research design was used with a case study approach. Data collection methods include interviews with lecturers and students, direct observation of in-class learning and documentation of the results of lecturer notes. The data analysis model adhered the interactive model of Miles, Huberman, and Saldana. Validity was ensured through passion, observation, triangulation, and expert views. The finding of the research showed problems with Arabic phonemes, which were categorized as Akhtha’ al-Harakat, Akhtha’ al-Ibdal, Akhtha’ al-Hadzf, and Akhtha’ al-Tahrif. Factors include language problems (characteristics of the first and second languages) and non-language problems (student characteristics, lecturer competence, learning strategies, lesson materials, and learning facilities). The proposed solutions included error and comparative analysis for language problems, motivation, diagnosis, cooperative learning, detailed examples, pronunciation exercises, and adequate facilities for non-language problems. This research provided a comprehensive study of the challenges of pronouncing Arabic phonemes at INSTIKA Madura. Certain types of errors and the underlying factors that affect pronunciation were identified. Practical solutions were proposed, addressing both language and non-language aspects to improve students' pronunciation skills. These findings offered valuable insights for educators, curriculum developers and language instructors, facilitating targeted interventions and effective teaching strategies to students struggling with Arabic phonetics

Open Journal Systems UIN (Universitas Islam Negeri) Raden Intan Lampung / Raden Intan State Islamic Institute of Lampung

NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task

Author: Abdul-Mageed Muhammad
Bouamor Houda
Elmadany AbdelRahim
Habash Nizar
Nagoudi El Moatez Billah
Zhang Chiyu
Publication venue
Publication date: 24/10/2023
Field of study

We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023). The objective of NADI is to help advance state-of-the-art Arabic NLP by creating opportunities for teams of researchers to collaboratively compete under standardized conditions. It does so with a focus on Arabic dialects, offering novel datasets and defining subtasks that allow for meaningful comparisons between different approaches. NADI 2023 targeted both dialect identification (Subtask 1) and dialect-to-MSA machine translation (Subtask 2 and Subtask 3). A total of 58 unique teams registered for the shared task, of whom 18 teams have participated (with 76 valid submissions during test phase). Among these, 16 teams participated in Subtask 1, 5 participated in Subtask 2, and 3 participated in Subtask 3. The winning teams achieved 87.27 F1 on Subtask 1, 14.76 Bleu in Subtask 2, and 21.10 Bleu in Subtask 3, respectively. Results show that all three subtasks remain challenging, thereby motivating future work in this area. We describe the methods employed by the participating teams and briefly offer an outlook for NADI.Comment: arXiv admin note: text overlap with arXiv:2210.0958

arXiv.org e-Print Archive

Machine Translation from Standard German to Alemannic Dialects

Author: Lambrecht L.
Schneider F.
Waibel A.
Publication venue: Association for Computational Linguistics
Publication date: 27/01/2023
Field of study

Machine translation has been researched using deep neural networks in recent years. These networks require lots of data to learn abstract representations of the input stored in continuous vectors. Dialect translation has become more important since the advent of social media. In particular, when dialect speakers and standard language speakers no longer understand each other, machine translation is of rising concern. Usually, dialect translation is a typical low-resourced language setting facing data scarcity problems. Additionally, spelling inconsistencies due to varying pronunciations and the lack of spelling rules complicate translation. This paper presents the best-performing approaches to handle these problems for Alemannic dialects. The results show that back-translation and conditioning on dialectal manifestations achieve the most remarkable enhancement over the baseline. Using back-translation, a significant gain of +4.5 over the strong transformer baseline of 37.3 BLEU points is accomplished. Differentiating between several Alemannic dialects instead of treating Alemannic as one dialect leads to substantial improvements: Multi-dialectal translation surpasses the baseline on the dialectal test sets. However, training individual models outperforms the multi-dialectal approach. There, improvements range from 7.5 to 10.6 BLEU points over the baseline depending on the dialect

KITopen

TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

Author: Abdul-Mageed Muhammad
El-Shangiti Ahmed Oumar
Kadaoui Karima
Khondaker Md Tawkat Islam
Magdy Samar M.
Nagoudi El Moatez Billah
Waheed Abdul
Publication venue
Publication date: 23/10/2023
Field of study

Despite the purported multilingual proficiency of instruction-finetuned large language models (LLMs) such as ChatGPT and Bard, the linguistic inclusivity of these models remains insufficiently explored. Considering this constraint, we present a thorough assessment of Bard and ChatGPT (encompassing both GPT-3.5 and GPT-4) regarding their machine translation proficiencies across ten varieties of Arabic. Our evaluation covers diverse Arabic varieties such as Classical Arabic (CA), Modern Standard Arabic (MSA), and several country-level dialectal variants. Our analysis indicates that LLMs may encounter challenges with dialects for which minimal public datasets exist, but on average are better translators of dialects than existing commercial systems. On CA and MSA, instruction-tuned LLMs, however, trail behind commercial systems such as Google Translate. Finally, we undertake a human-centric study to scrutinize the efficacy of the relatively recent model, Bard, in following human instructions during translation tasks. Our analysis reveals a circumscribed capability of Bard in aligning with human instructions in translation contexts. Collectively, our findings underscore that prevailing LLMs remain far from inclusive, with only limited ability to cater for the linguistic and cultural intricacies of diverse communities.Comment: ArabicNLP 202

arXiv.org e-Print Archive

Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling

Author: Ao Ben C. H.
Chao Lidia S.
Du Haihua
Wan Yu
Wong Derek F.
Yang Baosong
Publication venue
Publication date: 11/12/2019
Field of study

As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Supervised Adaptation of Sequence-to-Sequence Speech Recognition Systems using Batch-Weighting

Author: Huber Christian
Hussain Juan
Nguyen Tuan-Nam
Song Kaihang
Stüker Sebastian
Waibel Alexander
Publication venue: Association for Computational Linguistics
Publication date: 03/01/2024
Field of study

When training speech recognition systems, one often faces the situation that sufficient amounts of training data for the language in question are available but only small amounts of data for the domain in question. This problem is even bigger for end-to-end speech recognition systems that only accept transcribed speech as training data, which is harder and more expensive to obtain than text data. In this paper we present experiments in adapting end-to-end speech recognition systems by a method which is called batch-weighting and which we contrast against regular fine-tuning, i.e., to continue to train existing neural speech recognition models on adaptation data. We perform experiments using theses techniques in adapting to topic, accent and vocabulary, showing that batch-weighting consistently outperforms fine-tuning. In order to show the generalization capabilities of batch-weighting we perform experiments in several languages, i.e., Arabic, English and German. Due to its relatively small computational requirements batch-weighting is a suitable technique for supervised life-long learning during the life-time of a speech recognition system, e.g., from user corrections

KITopen

Sentence representation learning and generation for neural machine translation

Author: Ampomah Isaac Kojo Essel
Publication venue
Publication date: 01/05/2021
Field of study

Ulster University's Research Portal

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

Author
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

The 1st International Electronic Conference on Algorithms

Author
Publication venue: 'MDPI AG'
Publication date: 06/05/2022
Field of study

This book presents 22 of the accepted presentations at the 1st International Electronic Conference on Algorithms which was held completely online from September 27 to October 10, 2021. It contains 16 proceeding papers as well as 6 extended abstracts. The works presented in the book cover a wide range of fields dealing with the development of algorithms. Many of contributions are related to machine learning, in particular deep learning. Another main focus among the contributions is on problems dealing with graphs and networks, e.g., in connection with evacuation planning problems

Directory of Open Access Books (DOAB)