6 research outputs found
Towards spoken dialect identification of Irish
The Irish language is rich in its diversity of dialects and accents. This
compounds the difficulty of creating a speech recognition system for the
low-resource language, as such a system must contend with a high degree of
variability with limited corpora. A recent study investigating dialect bias in
Irish ASR found that balanced training corpora gave rise to unequal dialect
performance, with performance for the Ulster dialect being consistently worse
than for the Connacht or Munster dialects. Motivated by this, the present
experiments investigate spoken dialect identification of Irish, with a view to
incorporating such a system into the speech recognition pipeline. Two acoustic
classification models are tested, XLS-R and ECAPA-TDNN, in conjunction with a
text-based classifier using a pretrained Irish-language BERT model. The
ECAPA-TDNN, particularly a model pretrained for language identification on the
VoxLingua107 dataset, performed best overall, with an accuracy of 73%. This was
further improved to 76% by fusing the model's outputs with the text-based
model. The Ulster dialect was most accurately identified, with an accuracy of
94%, however the model struggled to disambiguate between the Connacht and
Munster dialects, suggesting a more nuanced approach may be necessary to
robustly distinguish between the dialects of Irish.Comment: Accepted to Interspeech 2023 Workshop of the 2nd Annual Meeting of
the Special Interest Group of Under-resourced Languages Workshop, Dublin
(SiGUL
Cross-lingual Emotion Detection
Emotion detection is of great importance for understanding humans.
Constructing annotated datasets to train automated models can be expensive. We
explore the efficacy of cross-lingual approaches that would use data from a
source language to build models for emotion detection in a target language. We
compare three approaches, namely: i) using inherently multilingual models; ii)
translating training data into the target language; and iii) using an
automatically tagged parallel corpus. In our study, we consider English as the
source language with Arabic and Spanish as target languages. We study the
effectiveness of different classification models such as BERT and SVMs trained
with different features. Our BERT-based monolingual models that are trained on
target language data surpass state-of-the-art (SOTA) by 4% and 5% absolute
Jaccard score for Arabic and Spanish respectively. Next, we show that using
cross-lingual approaches with English data alone, we can achieve more than 90%
and 80% relative effectiveness of the Arabic and Spanish BERT models
respectively. Lastly, we use LIME to interpret the differences between models
Contemporary Arabic Idiomatic Expressions and Methods of Teaching Al-Ta’birat al-Ishthilahiyyah al-‘Arabiyyah al-Mu‘asharah wa Thara’iqu Ta’limiha
This study started from the problem of the difficulty of teaching modern Arabic idioms among students of Babussalam As-Sunnah Institute in Depok. Although proficiency in it is an important part and it has a big role when acquiring the Arabic language fully and completely. This study aimed to reveal the difficulties of the Arabic teaching as well as to clarify the proposed teaching methods for the teaching. This research used a qualitative and descriptive approach, which is an analysis used to collect metadata. This research was conducted at the Babussalam Islamic Boarding School an Educational Institution under the auspices of the Babussalam As-Sunnah Foundation, Cimanggis, Depok, West Java. The results of this study found that difficulties from the point of view of Arabic teachers involved in the study procced from the following main aspects: the educational environment, educational curriculum or in its broadest sense, learners and teachers. The result of the research indicated the proposed teaching methods for teaching modern Arabic idioms from the book, there are: the inductive method, standard method, informative or repeating or speech method and dialogue method