6 research outputs found

    Towards spoken dialect identification of Irish

    Full text link
    The Irish language is rich in its diversity of dialects and accents. This compounds the difficulty of creating a speech recognition system for the low-resource language, as such a system must contend with a high degree of variability with limited corpora. A recent study investigating dialect bias in Irish ASR found that balanced training corpora gave rise to unequal dialect performance, with performance for the Ulster dialect being consistently worse than for the Connacht or Munster dialects. Motivated by this, the present experiments investigate spoken dialect identification of Irish, with a view to incorporating such a system into the speech recognition pipeline. Two acoustic classification models are tested, XLS-R and ECAPA-TDNN, in conjunction with a text-based classifier using a pretrained Irish-language BERT model. The ECAPA-TDNN, particularly a model pretrained for language identification on the VoxLingua107 dataset, performed best overall, with an accuracy of 73%. This was further improved to 76% by fusing the model's outputs with the text-based model. The Ulster dialect was most accurately identified, with an accuracy of 94%, however the model struggled to disambiguate between the Connacht and Munster dialects, suggesting a more nuanced approach may be necessary to robustly distinguish between the dialects of Irish.Comment: Accepted to Interspeech 2023 Workshop of the 2nd Annual Meeting of the Special Interest Group of Under-resourced Languages Workshop, Dublin (SiGUL

    Cross-lingual Emotion Detection

    Full text link
    Emotion detection is of great importance for understanding humans. Constructing annotated datasets to train automated models can be expensive. We explore the efficacy of cross-lingual approaches that would use data from a source language to build models for emotion detection in a target language. We compare three approaches, namely: i) using inherently multilingual models; ii) translating training data into the target language; and iii) using an automatically tagged parallel corpus. In our study, we consider English as the source language with Arabic and Spanish as target languages. We study the effectiveness of different classification models such as BERT and SVMs trained with different features. Our BERT-based monolingual models that are trained on target language data surpass state-of-the-art (SOTA) by 4% and 5% absolute Jaccard score for Arabic and Spanish respectively. Next, we show that using cross-lingual approaches with English data alone, we can achieve more than 90% and 80% relative effectiveness of the Arabic and Spanish BERT models respectively. Lastly, we use LIME to interpret the differences between models

    Contemporary Arabic Idiomatic Expressions and Methods of Teaching Al-Ta’birat al-Ishthilahiyyah al-‘Arabiyyah al-Mu‘asharah wa Thara’iqu Ta’limiha

    Get PDF
    This study started from the problem of the difficulty of teaching modern Arabic idioms among students of Babussalam As-Sunnah Institute in Depok. Although proficiency in it is an important part and it has a big role when acquiring the Arabic language fully and completely. This study aimed to reveal the difficulties of the Arabic teaching as well as to clarify the proposed teaching methods for the teaching. This research used a qualitative and descriptive approach, which is an analysis used to collect metadata. This research was conducted at the Babussalam Islamic Boarding School an Educational Institution under the auspices of the Babussalam As-Sunnah Foundation, Cimanggis, Depok, West Java. The results of this study found that difficulties from the point of view of Arabic teachers involved in the study procced from the following main aspects: the educational environment, educational curriculum or in its broadest sense, learners and teachers. The result of the research indicated the proposed teaching methods for teaching modern Arabic idioms from the book, there are: the inductive method, standard method, informative or repeating or speech method and dialogue method
    corecore