5,899 research outputs found

    Using Prosody and Phonotactics in Arabic Dialect Identiļ¬cation

    Get PDF
    While Modern Standard Arabic is the formal spoken and written language of the Arab world, dialects are the major communication mode for everyday life; identifying a speakerā€™s dialect is thus critical to speech processing tasks such as automatic speech recognition, as well as speaker identification We examine the role of prosodic features (intonation and rhythm) across four Arabic dialects: Gulf, Iraqi, Levantine, and Egyptian, for the purpose of automatic dialect identification We show that prosodic features can significantly improve identification, over a purely phonotactic-based approach, with an identification accuracy of 86.33% for 2m utterances

    Automatic Identification of Arabic Dialects USING Hidden Markov Models

    Get PDF
    The Arabic language has many different dialects, they must beidentified before Automatic Speech Recognition can take place.This thesis examines the difficult task of properly identifyingvarious Arabic dialects. We also present a novel design of anArabic dialect identification system using Hidden Markov Models(HMM). Due to the similarities and the differences between Arabicdialects, we build a ergodic HMM that has two types of states; oneof them represents the common sounds across Arabic dialects, whilethe other represents the unique sounds of the specific dialect. Wetie the common states across all models since they share the samesounds. We focus only on two major dialects: Egyptian and theGulf. An improved initialization process is used to achieve betterArabic dialect identification. Moreover, we utilize many differentcombinations of speech features related to MFCC such as timederivatives, energy, and the Shifted Delta Cepstra in training andtesting the system. We present a detailed comparison of theperformance of our Arabic dialect identification system using thedifferent combinations. The best result of the Arabic dialectidentification system is 96.67\% correct identification

    The GW/LT3 VarDial 2016 shared task system for dialects and similar languages detection

    Get PDF
    This paper describes the GW/LT3 contribution to the 2016 VarDial shared task on the identification of similar languages (task 1) and Arabic dialects (task 2). For both tasks, we experimented with Logistic Regression and Neural Network classifiers in isolation. Additionally, we implemented a cascaded classifier that consists of coarse and fine-grained classifiers (task 1) and a classifier ensemble with majority voting for task 2. The submitted systems obtained state-of-the-art performance and ranked first for the evaluation on social media data (test sets B1 and B2 for task 1), with a maximum weighted F1 score of 91.94%

    Automatic symptoms identification from a massive volume of unstructured medical consultations using deep neural and BERT models

    Get PDF
    Automatic symptom identification plays a crucial role in assisting doctors during the diagnosis process in Telemedicine. In general, physicians spend considerable time on clinical documentation and symptom identification, which is unfeasible due to their full schedule. With text-based consultation services in telemedicine, the identification of symptoms from a userā€™s consultation is a sophisticated process and time-consuming. Moreover, at Altibbi, which is an Arabic telemedicine platform and the context of this work, users consult doctors and describe their conditions in different Arabic dialects which makes the problem more complex and challenging. Therefore, in this work, an advanced deep learning approach is developed consultations with multi-dialects. The approach is formulated as a multi-label multi-class classification using features extracted based on AraBERT and fine-tuned on the bidirectional long short-term memory (BiLSTM) network. The Fine-tuning of BiLSTM relies on features engineered based on different variants of the bidirectional encoder representations from transformers (BERT). Evaluating the models based on precision, recall, and a customized hit rate showed a successful identification of symptoms from Arabic texts with promising accuracy. Hence, this paves the way toward deploying an automated symptom identification model in production at Altibbi which can help general practitioners in telemedicine in providing more efficient and accurate consultations
    • ā€¦
    corecore