18 research outputs found

    Arabic Phoneme Learning Challenges for Madurese Students and the Solutions

    Get PDF
    This article discussed the challenges in pronouncing Arabic phonemes by students at INSTIKA Madura. Phoneme pronunciation is the most important principle in Arabic. Without the correct phoneme pronunciation, a language cannot be understood. The problem of phoneme pronunciation was investigated and a solution was found based on factor analysis. Qualitative descriptive research design was used with a case study approach. Data collection methods include interviews with lecturers and students, direct observation of in-class learning and documentation of the results of lecturer notes. The data analysis model adhered the interactive model of Miles, Huberman, and Saldana. Validity was ensured through passion, observation, triangulation, and expert views. The finding of the research showed problems with Arabic phonemes, which were categorized as Akhtha’ al-Harakat, Akhtha’ al-Ibdal, Akhtha’ al-Hadzf, and Akhtha’ al-Tahrif. Factors include language problems (characteristics of the first and second languages) and non-language problems (student characteristics, lecturer competence, learning strategies, lesson materials, and learning facilities). The proposed solutions included error and comparative analysis for language problems, motivation, diagnosis, cooperative learning, detailed examples, pronunciation exercises, and adequate facilities for non-language problems. This research provided a comprehensive study of the challenges of pronouncing Arabic phonemes at INSTIKA Madura. Certain types of errors and the underlying factors that affect pronunciation were identified. Practical solutions were proposed, addressing both language and non-language aspects to improve students' pronunciation skills. These findings offered valuable insights for educators, curriculum developers and language instructors, facilitating targeted interventions and effective teaching strategies to students struggling with Arabic phonetics

    NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task

    Full text link
    We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023). The objective of NADI is to help advance state-of-the-art Arabic NLP by creating opportunities for teams of researchers to collaboratively compete under standardized conditions. It does so with a focus on Arabic dialects, offering novel datasets and defining subtasks that allow for meaningful comparisons between different approaches. NADI 2023 targeted both dialect identification (Subtask 1) and dialect-to-MSA machine translation (Subtask 2 and Subtask 3). A total of 58 unique teams registered for the shared task, of whom 18 teams have participated (with 76 valid submissions during test phase). Among these, 16 teams participated in Subtask 1, 5 participated in Subtask 2, and 3 participated in Subtask 3. The winning teams achieved 87.27 F1 on Subtask 1, 14.76 Bleu in Subtask 2, and 21.10 Bleu in Subtask 3, respectively. Results show that all three subtasks remain challenging, thereby motivating future work in this area. We describe the methods employed by the participating teams and briefly offer an outlook for NADI.Comment: arXiv admin note: text overlap with arXiv:2210.0958

    Machine Translation from Standard German to Alemannic Dialects

    Get PDF
    Machine translation has been researched using deep neural networks in recent years. These networks require lots of data to learn abstract representations of the input stored in continuous vectors. Dialect translation has become more important since the advent of social media. In particular, when dialect speakers and standard language speakers no longer understand each other, machine translation is of rising concern. Usually, dialect translation is a typical low-resourced language setting facing data scarcity problems. Additionally, spelling inconsistencies due to varying pronunciations and the lack of spelling rules complicate translation. This paper presents the best-performing approaches to handle these problems for Alemannic dialects. The results show that back-translation and conditioning on dialectal manifestations achieve the most remarkable enhancement over the baseline. Using back-translation, a significant gain of +4.5 over the strong transformer baseline of 37.3 BLEU points is accomplished. Differentiating between several Alemannic dialects instead of treating Alemannic as one dialect leads to substantial improvements: Multi-dialectal translation surpasses the baseline on the dialectal test sets. However, training individual models outperforms the multi-dialectal approach. There, improvements range from 7.5 to 10.6 BLEU points over the baseline depending on the dialect

    TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

    Full text link
    Despite the purported multilingual proficiency of instruction-finetuned large language models (LLMs) such as ChatGPT and Bard, the linguistic inclusivity of these models remains insufficiently explored. Considering this constraint, we present a thorough assessment of Bard and ChatGPT (encompassing both GPT-3.5 and GPT-4) regarding their machine translation proficiencies across ten varieties of Arabic. Our evaluation covers diverse Arabic varieties such as Classical Arabic (CA), Modern Standard Arabic (MSA), and several country-level dialectal variants. Our analysis indicates that LLMs may encounter challenges with dialects for which minimal public datasets exist, but on average are better translators of dialects than existing commercial systems. On CA and MSA, instruction-tuned LLMs, however, trail behind commercial systems such as Google Translate. Finally, we undertake a human-centric study to scrutinize the efficacy of the relatively recent model, Bard, in following human instructions during translation tasks. Our analysis reveals a circumscribed capability of Bard in aligning with human instructions in translation contexts. Collectively, our findings underscore that prevailing LLMs remain far from inclusive, with only limited ability to cater for the linguistic and cultural intricacies of diverse communities.Comment: ArabicNLP 202

    Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling

    Full text link
    As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.Comment: AAAI 202

    Supervised Adaptation of Sequence-to-Sequence Speech Recognition Systems using Batch-Weighting

    Get PDF
    When training speech recognition systems, one often faces the situation that sufficient amounts of training data for the language in question are available but only small amounts of data for the domain in question. This problem is even bigger for end-to-end speech recognition systems that only accept transcribed speech as training data, which is harder and more expensive to obtain than text data. In this paper we present experiments in adapting end-to-end speech recognition systems by a method which is called batch-weighting and which we contrast against regular fine-tuning, i.e., to continue to train existing neural speech recognition models on adaptation data. We perform experiments using theses techniques in adapting to topic, accent and vocabulary, showing that batch-weighting consistently outperforms fine-tuning. In order to show the generalization capabilities of batch-weighting we perform experiments in several languages, i.e., Arabic, English and German. Due to its relatively small computational requirements batch-weighting is a suitable technique for supervised life-long learning during the life-time of a speech recognition system, e.g., from user corrections

    Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

    Get PDF
    Peer reviewe

    The 1st International Electronic Conference on Algorithms

    Get PDF
    This book presents 22 of the accepted presentations at the 1st International Electronic Conference on Algorithms which was held completely online from September 27 to October 10, 2021. It contains 16 proceeding papers as well as 6 extended abstracts. The works presented in the book cover a wide range of fields dealing with the development of algorithms. Many of contributions are related to machine learning, in particular deep learning. Another main focus among the contributions is on problems dealing with graphs and networks, e.g., in connection with evacuation planning problems
    corecore