research

Automatically generated, phonemic Arabic-IPA pronunciation tiers for the boundary annotated Qur'an dataset for machine learning (version 2.0)

Abstract

In this paper, we augment the Boundary Annotated Qur?an dataset published at LREC 2012 (Brierley et al 2012; Sawalha et al 2012a) with automatically generated phonemic transcriptions of Arabic words. We have developed and evaluated a comprehensive grapheme-phoneme mapping from Standard Arabic \ensuremath> IPA (Brierley et al under review), and implemented the mapping in Arabic transcription technology which achieves 100% accuracy as measured against two gold standards: one for Qur?anic or Classical Arabic, and one for Modern Standard Arabic (Sawalha et al [1]). Our mapping algorithm has also been used to generate a pronunciation guide for a subset of Qur?anic words with heightened prosody (Brierley et al 2014). This is funded research under the EPSRC " Working Together" theme

    Similar works