1,263 research outputs found
Automatic Dialect Detection in Arabic Broadcast Speech
We investigate different approaches for dialect identification in Arabic
broadcast speech, using phonetic, lexical features obtained from a speech
recognition system, and acoustic features using the i-vector framework. We
studied both generative and discriminate classifiers, and we combined these
features using a multi-class Support Vector Machine (SVM). We validated our
results on an Arabic/English language identification task, with an accuracy of
100%. We used these features in a binary classifier to discriminate between
Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%. We
further report results using the proposed method to discriminate between the
five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine,
North African, and MSA, with an accuracy of 52%. We discuss dialect
identification errors in the context of dialect code-switching between
Dialectal Arabic and MSA, and compare the error pattern between manually
labeled data, and the output from our classifier. We also release the train and
test data as standard corpus for dialect identification
The GW/LT3 VarDial 2016 shared task system for dialects and similar languages detection
This paper describes the GW/LT3 contribution to the 2016 VarDial shared task on the identification of similar languages (task 1) and Arabic dialects (task 2). For both tasks, we experimented with Logistic Regression and Neural Network classifiers in isolation. Additionally, we implemented a cascaded classifier that consists of coarse and fine-grained classifiers (task 1) and a classifier ensemble with majority voting for task 2. The submitted systems obtained state-of-the-art performance and ranked first for the evaluation on social media data (test sets B1 and B2 for task 1), with a maximum weighted F1 score of 91.94%
- …