Search CORE

2,504 research outputs found

Recommended from our members

Parsing Arabic Dialects

Author: Chiang David
Diab Mona T.
Habash Nizar Y.
Hwa Rebecca
Lacey Vincent
Levy Roger
Nichols Carol
Rambow Owen C.
Shareef Safiullah
Sima'an Khalil
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

The Arabic language is a collection of spoken dialects with important phonological, morphological, lexical, and syntactic differences, along with a standard written language, Modern Standard Arabic (MSA). Since the spoken dialects are not officially written, it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper, we address the problem of parsing transcribed spoken Levantine Arabic (LA). We do not assume the existence of any annotated LA corpus (except for development and testing), nor of a parallel corpus LA-MSA. Instead, we use explicit knowledge about the relation between LA and MSA

Columbia University Academic Commons

Word stress in Arabic

Author: Watson JCE
Publication venue: Wiley-Blackwell
Publication date: 01/01/2011
Field of study

University of Salford Institutional Repository

White Rose Research Online

Natural language processing for similar languages, varieties, and dialects: A survey

Author: Nakov Preslav
Scherrer Yves
Zampieri Marcos
Publication venue
Publication date: 20/11/2020
Field of study

There has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.Non peer reviewe

Helsingin yliopiston digitaalinen arkisto