Search CORE

15 research outputs found

ArbDialectID at MADAR Shared Task 1: Language Modelling and Ensemble Learning for Fine Grained Arabic Dialect Identification

Author: Abu Kwaik Kathrein
Saad Motaz K
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

In this paper, we present a Dialect Identification system (ArbDialectID) that competed at Task 1 of the MADAR shared task, MADARTravel Domain Dialect Identification. We build a course and a fine-grained identification model to predict the label (corresponding to a dialect of Arabic) of a given text. We build two language models by extracting features at two levels (words and characters). We firstly build a coarse identification model to classify each sentence into one out of six dialects, then use this label as a feature for the fine-grained model that classifies the sentence among 26 dialects from different Arab cities, after that we apply ensemble voting classifier on both sub-systems. Our system ranked 1st that achieving an f-score of 67.32%. Both the models and our feature engineering tools are made available to the research community.In this paper, we present a Dialect Identification system (ArbDialectID) that competed at Task 1 of the MADAR shared task, MADARTravel Domain Dialect Identification. We build a course and a fine-grained identification model to predict the label (corresponding to a dialect of Arabic) of a given text. We build two language models by extracting features at two levels (words and characters). We firstly build a coarse identification model to classify each sentence into one out of six dialects, then use this label as a feature for the fine-grained model that classifies the sentence among 26 dialects from different Arab cities, after that we apply ensemble voting classifier on both sub-systems. Our system ranked 1st that achieving an f-score of 67.32%. Both the models and our feature engineering tools are made available to the research community

Crossref

Institutional Repository of the Islamic University of Gaza

st madar 2019 shared task arabic fine grained dialect identification

Author: Abed Alhakim Freihat
Mohamed Lichouri
Mourad Abbas
Publication venue
Publication date: 01/01/2019
Field of study

Crossref

Open Access Repository

mawdoo3 ai at madar shared task arabic fine grained dialect identification with ensemble learning

Author: Abdelrahman Mattar
Abed Alhakim Freihat
Ahmad Mustafa
Ahmad Ragab
Bashar Talafha
Haitham Seelawi
Hesham Al-Bataineh
Hussein Al-Natsheh
Mohammad Zaghloul
Mostafa Samir
Publication venue
Publication date: 01/01/2019
Field of study

Crossref

Open Access Repository