Search CORE

7 research outputs found

A review on corpus annotation for arabic sentiment analysis

Author: A alOwisheq
A Kaur
A Mountassir
AM Azmi
CC Aggarwal
G Leech
H ElSahar
H Ibrahim
J Carletta
J Cohen
M Saleh
NY Habash
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Mining publicly available data for meaning and value is an important research direction within social media analysis. To automatically analyze collected textual data, a manual effort is needed for a successful machine learning algorithm to effectively classify text. This pertains to annotating the text adding labels to each data entry. Arabic is one of the languages that are growing rapidly in the research of sentiment analysis, despite limited resources and scares annotated corpora. In this paper, we review the annotation process carried out by those papers. A total of 27 papers were reviewed between the years of 2010 and 2016

Crossref

Warwick Research Archives Portal Repository

Phonetically rich and balanced text and speech corpora for Arabic language

Author: A Roberts
AM Elshafei
C Cieri
LR Rabiner
Mohammad A. M. Abushariah
Moustafa Elshafei
NY Habash
Othman O. Khalifa
R Bakis
Raja N. Ainon
Roziati Zainuddin
YA Alotaibi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/11/2011
Field of study

This paper describes the preparation, recording, analyzing, and evaluation of a new speech corpus for Modern Standard Arabic (MSA). The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing three major regions (Levant, Gulf, and Africa). Three hundred and sixty seven sentences are considered as phonetically rich and balanced, which are used for training Arabic Automatic Speech Recognition (ASR) systems. The rich characteristic is in the sense that it must contain all phonemes of Arabic language, whereas the balanced characteristic is in the sense that it must preserve the phonetic distribution of Arabic language. The remaining 48 sentences are created fo

Crossref

The International Islamic University Malaysia Repository