Search CORE

27 research outputs found

Exploring Twitter as a Source of an Arabic Dialect Corpus

Author: Alshutayri AOO
Atwell E
Publication venue: CSC Journals
Publication date: 01/06/2017
Field of study

Given the lack of Arabic dialect text corpora in comparison with what is available for dialects of English and other languages, there is a need to create dialect text corpora for use in Arabic natural language processing. What is more, there is an increasing use of Arabic dialects in social media, so this text is now considered quite appropriate as a source of a corpus. We collected 210,915K tweets from five groups of Arabic dialects Gulf, Iraqi, Egyptian, Levantine, and North African. This paper explores Twitter as a source and describes the methods that we used to extract tweets and classify them according to the geographic location of the sender. We classified Arabic dialects by using Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many alternative filters and classifiers for machine learning. Our approach in classification tweets achieved an accuracy equal to 79%

White Rose Research Online

PHONOLOGICAL VARIATION OF PASAR KLIWON ARABIC DIALECT SURAKARTA

Author: Al Aziiz Arief Nur Rahman
Ridwan Muhammad
Publication venue: 'Universitas Sebelas Maret'
Publication date: 11/05/2019
Field of study

This article studies about sound variation sand sound change in Arabic dialect Pasar Kliwon. The data searching use observe (simak) and conversation (cakap) method. The technique of data searching is record (rekam) and register (catat). The data searching refers to question list from 120 swadesh vocabularies. Data analysis used padan method and depends on informan’s speech organ. The analysis research use sound change theory according to Crowley (1992) and Muslich (2012). The vowel sound in Arabic dialect Pasar Kliwon divided by two kinds: short vowel sound and long vowel sound. There are twenty sevenconsonant sounds and divided by seven kinds: plosive, fricative, affricative, liquid, voiced, voiceless, and velariation sound. The sound variation of semi-vowel is wawu and ya>’. The vowel sound change divided by four kinds: lenition, anaptycsis, apocope, metathesis. The consonant sound change divided by four kinds: lenition, anaptycsis, apocope, and sincope.The diftong sound change is monoftongitation

PRASASTI: Journal of Linguistics

Phonetic inventory for an Arabic speech corpus

Author: Halabi Nawar
Wald Mike
Publication venue
Publication date: 25/05/2016
Field of study

Corpus design for speech synthesis is a well-researched topic in languages such as English compared to Modern Standard Arabic, and there is a tendency to focus on methods to automatically generate the orthographic transcript to be recorded (usually greedy methods). In this work, a study of Modern Standard Arabic (MSA) phonetics and phonology is conducted in order to create criteria for a greedy meth-od to create a speech corpus transcript for recording. The size of the dataset is reduced a number of times using these optimisation methods with different parameters to yield a much smaller dataset with identical phonetic coverage than before the reduction, and this output transcript is chosen for recording. This is part of a larger work to create a completely annotated and segmented speech corpus for MSA

Southampton (e-Prints Soton)

PROBLEMATYKA OGÓLNA I LOKALNA W LINGWISTYCE SĄDOWEJ NA PRZYKŁADZIE JĘZYKA ARABSKIEGO

Author: ROSENHOUSE Judith
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date: 06/01/2013
Field of study

This paper is concerned with four main aspects or parts of forensic linguistics: Forensic linguistics in speech mode and in writing, the special status of Arabic, linguistic problems and possibilities of translation for forensics, and Language Analysis for Determination of Origin (LADO). After presenting these issues in the introduction, we describe the language situation of Arabic, mainly in Israel, in the context of these four issues. The discussion is based on the literature concerning problems of translation and LADO in courts of justice in various countries, including Israel. We consider LADO as a developing field of forensic linguistics, and demonstrate by examples some problems that may rise from speech recordings of Arabic speaking asylum seekers. Based on this survey, we point out in the conclusion some research needs of general forensic linguistics and Arabic related forensic linguistics.Artykuł koncentruje się na czterech aspektach lingwistyki sądowej: lingwistyka sądowa jako sposób formułowania treści mówionych i pisanych, szczególny status języka arabskiego, problemy lingwistyczne i możliwości tłumaczenia w sądach, zastosowanie analizy językowej do ustalenia pochodzenia. Po przedstawieniu tych kwestii opisana zostanie w ich kontekście sytuacja języka arabskiego, głównie w Izraelu

Biblioteka Nauki - repozytorium artykuÅÃ³w

Crossref

Comparative Legilinguistics

PROBLEMATYKA OGÓLNA I LOKALNA W LINGWISTYCE SĄDOWEJ NA PRZYKŁADZIE JĘZYKA ARABSKIEGO

Author
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date
Field of study

Crossref

Supervector pre-processing for PRSVM-based Chinese and Arabic dialect identification

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Unsupervised Phoneme Segmentation Based on Main Energy Change for Arabic Speech, Journal of Telecommunications and Information Technology, 2017, nr 1

Author: Lachachi Noureddine
Publication venue: 'National Institute of Telecommunications'
Publication date
Field of study

In this paper, a new method for segmenting speech at the phoneme level is presented. For this purpose, author uses the short-time Fourier transform of the speech signal. The goal is to identify the locations of main energy changes in frequency over time, which can be described as phoneme boundaries. A frequency range analysis and search for energy changes in individual area is applied to obtain further precision to identify speech segments that carry out vowel and consonant segment confined in small number of narrow spectral areas. This method merely utilizes the power spectrum of the signal for segmentation. There is no need for any adaptation of the parameters or training for different speakers in advance. In addition, no transcript information, neither any prior linguistic knowledge about the phonemes is needed, or voiced/unvoiced decision making is required. Segmentation results with proposed method have been compared with a manual segmentation, and compared with three same kinds of segmentation methods. These results show that 81% of the boundaries are successfully identified. This research aims to improve the acoustic parameters for all the processing systems of the Arab speech

Biblioteka Cyfrowa Instytutu Łączności / National Institute of Telecomunications: Digital Library