18 research outputs found
Arabic Phoneme Learning Challenges for Madurese Students and the Solutions
This article discussed the challenges in pronouncing Arabic phonemes by students at INSTIKA Madura. Phoneme pronunciation is the most important principle in Arabic. Without the correct phoneme pronunciation, a language cannot be understood. The problem of phoneme pronunciation was investigated and a solution was found based on factor analysis. Qualitative descriptive research design was used with a case study approach. Data collection methods include interviews with lecturers and students, direct observation of in-class learning and documentation of the results of lecturer notes. The data analysis model adhered the interactive model of Miles, Huberman, and Saldana. Validity was ensured through passion, observation, triangulation, and expert views. The finding of the research showed problems with Arabic phonemes, which were categorized as Akhtha’ al-Harakat, Akhtha’ al-Ibdal, Akhtha’ al-Hadzf, and Akhtha’ al-Tahrif. Factors include language problems (characteristics of the first and second languages) and non-language problems (student characteristics, lecturer competence, learning strategies, lesson materials, and learning facilities). The proposed solutions included error and comparative analysis for language problems, motivation, diagnosis, cooperative learning, detailed examples, pronunciation exercises, and adequate facilities for non-language problems. This research provided a comprehensive study of the challenges of pronouncing Arabic phonemes at INSTIKA Madura. Certain types of errors and the underlying factors that affect pronunciation were identified. Practical solutions were proposed, addressing both language and non-language aspects to improve students' pronunciation skills. These findings offered valuable insights for educators, curriculum developers and language instructors, facilitating targeted interventions and effective teaching strategies to students struggling with Arabic phonetics
NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task
We describe the findings of the fourth Nuanced Arabic Dialect Identification
Shared Task (NADI 2023). The objective of NADI is to help advance
state-of-the-art Arabic NLP by creating opportunities for teams of researchers
to collaboratively compete under standardized conditions. It does so with a
focus on Arabic dialects, offering novel datasets and defining subtasks that
allow for meaningful comparisons between different approaches. NADI 2023
targeted both dialect identification (Subtask 1) and dialect-to-MSA machine
translation (Subtask 2 and Subtask 3). A total of 58 unique teams registered
for the shared task, of whom 18 teams have participated (with 76 valid
submissions during test phase). Among these, 16 teams participated in Subtask
1, 5 participated in Subtask 2, and 3 participated in Subtask 3. The winning
teams achieved 87.27
F1 on Subtask 1, 14.76 Bleu in Subtask 2, and 21.10 Bleu in Subtask 3,
respectively. Results show that all three subtasks remain challenging, thereby
motivating future work in this area. We describe the methods employed by the
participating teams and briefly offer an outlook for NADI.Comment: arXiv admin note: text overlap with arXiv:2210.0958
Machine Translation from Standard German to Alemannic Dialects
Machine translation has been researched using deep neural networks in recent years. These networks require lots of data to learn abstract representations of the input stored in continuous vectors. Dialect translation has become more important since the advent of social media. In particular, when dialect speakers and standard language speakers no longer understand each other, machine translation is of rising concern. Usually, dialect translation is a typical low-resourced language setting facing data scarcity problems. Additionally, spelling inconsistencies due to varying pronunciations and the lack of spelling rules complicate translation. This paper presents the best-performing approaches to handle these problems for Alemannic dialects. The results show that back-translation and conditioning on dialectal manifestations achieve the most remarkable enhancement over the baseline. Using back-translation, a significant gain of +4.5 over the strong transformer baseline of 37.3 BLEU points is accomplished. Differentiating between several Alemannic dialects instead of treating Alemannic as one dialect leads to substantial improvements: Multi-dialectal translation surpasses the baseline on the dialectal test sets. However, training individual models outperforms the multi-dialectal approach. There, improvements range from 7.5 to 10.6 BLEU points over the baseline depending on the dialect
TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties
Despite the purported multilingual proficiency of instruction-finetuned large
language models (LLMs) such as ChatGPT and Bard, the linguistic inclusivity of
these models remains insufficiently explored. Considering this constraint, we
present a thorough assessment of Bard and ChatGPT (encompassing both GPT-3.5
and GPT-4) regarding their machine translation proficiencies across ten
varieties of Arabic. Our evaluation covers diverse Arabic varieties such as
Classical Arabic (CA), Modern Standard Arabic (MSA), and several country-level
dialectal variants. Our analysis indicates that LLMs may encounter challenges
with dialects for which minimal public datasets exist, but on average are
better translators of dialects than existing commercial systems. On CA and MSA,
instruction-tuned LLMs, however, trail behind commercial systems such as Google
Translate. Finally, we undertake a human-centric study to scrutinize the
efficacy of the relatively recent model, Bard, in following human instructions
during translation tasks. Our analysis reveals a circumscribed capability of
Bard in aligning with human instructions in translation contexts. Collectively,
our findings underscore that prevailing LLMs remain far from inclusive, with
only limited ability to cater for the linguistic and cultural intricacies of
diverse communities.Comment: ArabicNLP 202
Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling
As a special machine translation task, dialect translation has two main
characteristics: 1) lack of parallel training corpus; and 2) possessing similar
grammar between two sides of the translation. In this paper, we investigate how
to exploit the commonality and diversity between dialects thus to build
unsupervised translation models merely accessing to monolingual data.
Specifically, we leverage pivot-private embedding, layer coordination, as well
as parameter sharing to sufficiently model commonality and diversity among
source and target, ranging from lexical, through syntactic, to semantic levels.
In order to examine the effectiveness of the proposed models, we collect 20
million monolingual corpus for each of Mandarin and Cantonese, which are
official language and the most widely used dialect in China. Experimental
results reveal that our methods outperform rule-based simplified and
traditional Chinese conversion and conventional unsupervised translation models
over 12 BLEU scores.Comment: AAAI 202
Supervised Adaptation of Sequence-to-Sequence Speech Recognition Systems using Batch-Weighting
When training speech recognition systems, one often faces the situation that sufficient amounts of training data for the language in question are available but only small amounts of data for the domain in question. This problem is even bigger for end-to-end speech recognition systems that only accept transcribed speech as training data, which is harder and more expensive to obtain than text data. In this paper we present experiments in adapting end-to-end speech recognition systems by a method which is called batch-weighting and which we contrast against regular fine-tuning, i.e., to continue to train existing neural speech recognition models on adaptation data. We perform experiments using theses techniques in adapting to topic, accent and vocabulary, showing that batch-weighting consistently outperforms fine-tuning. In order to show the generalization capabilities of batch-weighting we perform experiments in several languages, i.e., Arabic, English and German. Due to its relatively small computational requirements batch-weighting is a suitable technique for supervised life-long learning during the life-time of a speech recognition system, e.g., from user corrections
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Peer reviewe
The 1st International Electronic Conference on Algorithms
This book presents 22 of the accepted presentations at the 1st International Electronic Conference on Algorithms which was held completely online from September 27 to October 10, 2021. It contains 16 proceeding papers as well as 6 extended abstracts. The works presented in the book cover a wide range of fields dealing with the development of algorithms. Many of contributions are related to machine learning, in particular deep learning. Another main focus among the contributions is on problems dealing with graphs and networks, e.g., in connection with evacuation planning problems