535 research outputs found

    LANGUAGE A PHENOMENON OF CULTURE AND COMMUNICATION, PRESERVING THE LANGUAGE THROUGH A PHONETIC MAP

    Get PDF
    The language is the key instrument by which we assimilate the culture of our country. As culture is influenced by various factors related to history, traditions, etc. language is also affected by these phenomena. Their influence may differ from one region to another; this can produce specific regional language known as dialects. The dialect encloses lexical, syntactic or phonetic particularities of the language, used by a part of the population or some regions of the nation. The Tunisian dialect is becoming increasingly important: it is used more and more by Tunisian chains of TV, newspapers, etc. However, several expressions have become unusual for some people, some regions or some generations. This phenomenon is due to the influence of the language by several social and cultural factors. These dialects deserve to be studied in order to preserve them and to keep them in memory. This paper consists of two, disparate parts. In the first part, we present the experiences of many years of fieldwork made by the staff of linguistic Atlas in Tunisia. As a result of the fieldwork, the project now has access to recordings from approximately 3000 speakers, describing the Tunisian dialect, in more than 250 Tunisian regions. We are currently pointing research on various aspects of the sound patterns of these dialects: the phonetic system. The second part of the paper is meant to give an overview of an automated system describing the phonetic system of the Tunisian dialect. This system can be very useful in understanding the variation of dialects in different geographical regions of Tunisia through an intelligent map representing variations of sounds as we move from one place to another

    TArC: Incrementally and semi-automatically collecting a Tunisian arabish corpus

    Get PDF
    This article describes the constitution process of the first morpho-syntactically annotated Tunisian Arabish Corpus (TArC). Arabish, also known as Arabizi, is a spontaneous coding of Arabic dialects in Latin characters and arithmographs (numbers used as letters). This code-system was developed by Arabic-speaking users of social media in order to facilitate the writing in the Computer-Mediated Communication (CMC) and text messaging informal frameworks. There is variety in the realization of Arabish amongst dialects, and each Arabish code-system is under-resourced, in the same way as most of the Arabic dialects. In the last few years, the focus on Arabic dialects in the NLP field has considerably increased. Taking this into consideration, TArC will be a useful support for different types of analyses, computational and linguistic, as well as for NLP tools training. In this article we will describe preliminary work on the TArC semi-automatic construction process and some of the first analyses we developed on TArC. In addition, in order to provide a complete overview of the challenges faced during the building process, we will present the main Tunisian dialect characteristics and their encoding in Tunisian Arabish

    A Study of a Non-Resourced Language: The Case of one of the Algerian Dialects

    Get PDF
    International audienceThis paper presents a linguistic study of an algerian arabic dialect, namely the dialect of Annaba (AD). It also presents the methodology applied in the construction of a parallel corpus MSA-AD. This work is done in a future goal of developing a machine translation system of standard Arabic (MSA) to algerian arabic dialects

    TArC: Incrementally and Semi-Automatically Collecting a Tunisian Arabish Corpus

    Full text link
    This article describes the constitution process of the first morpho-syntactically annotated Tunisian Arabish Corpus (TArC). Arabish, also known as Arabizi, is a spontaneous coding of Arabic dialects in Latin characters and arithmographs (numbers used as letters). This code-system was developed by Arabic-speaking users of social media in order to facilitate the writing in the Computer-Mediated Communication (CMC) and text messaging informal frameworks. There is variety in the realization of Arabish amongst dialects, and each Arabish code-system is under-resourced, in the same way as most of the Arabic dialects. In the last few years, the focus on Arabic dialects in the NLP field has considerably increased. Taking this into consideration, TArC will be a useful support for different types of analyses, computational and linguistic, as well as for NLP tools training. In this article we will describe preliminary work on the TArC semi-automatic construction process and some of the first analyses we developed on TArC. In addition, in order to provide a complete overview of the challenges faced during the building process, we will present the main Tunisian dialect characteristics and their encoding in Tunisian Arabish.Comment: Paper accepted at the Language Resources and Evaluation Conference (LREC) 202

    DaCToR: A data collection tool for the RELATER project

    Get PDF
    Collecting domain-specific data for under-resourced languages, e.g., dialects of languages, can be very expensive, potentially financially prohibitive and taking long time. Moreover, in the case of rarely written languages, the normalization of non-canonical transcription might be another time consuming but necessary task. In order to collect domain-specific data in such circumstances in a time and cost-efficient way, collecting read data of pre-prepared texts is often a viable option. In order to collect data in the domain of psychiatric diagnosis in Arabic dialects for the project RELATER, we have prepared the data collection tool DaCToR for collecting read texts by speakers in the respective countries and districts in which the dialects are spoken. In this paper we describe our tool, its purpose within the project RELATER and the dialects which we have started to collect with the tool
    • …
    corecore