32 research outputs found

    PADIC: extension and new experiments

    Get PDF
    International audiencePADIC is a multidialectal parallel Arabic corpus. It was composed initially by five Arabic dialects, three from the Maghreb and two from the Middle East, in addition to standard Arabic. In this paper, we present an augmented version of PADIC with a Moroccan dialect. We give also an evaluation, using the σ–index, of the computerization level of the Arabic dialects present in PADIC which reveals that these languages are really under-resourced. Several experiments in machine translation, in both sides between all the combinations of language pairs, are discussed too. For each language, we interpolated the corresponding Language Model (LM) with a large Arabic corpus based LM. The results show that this interpolation is in some cases without effect on the performances of translation systems and in others is rather penalizing

    Arabic Dialect Texts Classification

    Get PDF
    This study investigates how to classify Arabic dialects in text by extracting features which show the differences between dialects. There has been a lack of research about classification of Arabic dialect texts, in comparison to English and some other languages, due to the lack of Arabic dialect text corpora in comparison with what is available for dialects of English and some other languages. What is more, there is an increasing use of Arabic dialects in social media, so this text is now considered quite appropriate as a medium of communication and as a source of a corpus. We collected tweets from Twitter, comments from Facebook and online newspapers from five groups of Arabic dialects: Gulf, Iraqi, Egyptian, Levantine, and North African. The research sought to: 1) create a dataset of Arabic dialect texts to use in training and testing the system of classification, 2) find appropriate features to classify Arabic dialects: lexical (word and multi-word-unit) and grammatical variation across dialects, 3) build a more sophisticated filter to extract features from Arabic-character written dialect text files. In this thesis, the first part describes the research motivation to show the reason for choosing the Arabic dialects as a research topic. The second part presents some background information about the Arabic language and its dialects, and the literature review shows previous research about this subject. The research methodology part shows the initial experiment to classify Arabic dialects. The results of this experiment showed the need to create an Arabic dialect text corpus, by exploring Twitter and online newspaper. The corpus used to train the ensemble classifier and to improve the accuracy of classification the corpus was extended by collecting tweets from Twitter based on the spatial coordinate points and comments from Facebook posts. The corpus was annotated with dialect labels and used in automatic dialect classification experiments. The last part of this thesis presents the results of classification, conclusions and future work

    Investigating data sharing in speech recognition for an underresourced language: the case of algerian dialect

    Get PDF
    International audienceThe Arabic language has many varieties, including its standard form, Modern Standard Arabic (MSA), and its spoken forms, namely the dialects. Those dialects are representative examples of under-resourced languages for which automatic speech recognition is considered as an unresolved issue. To address this issue, we recorded several hours of spoken Algerian dialect and used them to train a baseline model. This model was boosted afterwards by taking advantage of other languages that impact this dialect by integrating their data in one large corpus and by investigating three approaches: multilingual training, multitask learning and transfer learning. The best performance was achieved using a limited and balanced amount of acoustic data from each additional language, as compared to the data size of the studied dialect. This approach led to an improvement of 3.8% in terms of word error rate in comparison to the baseline system trained only on the dialect data

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Research Developments in World Englishes

    Get PDF
    This book is available as open access through the Bloomsbury Open Access programme and is available on www.bloomsburycollections.com. It is funded by the University of Klagenfurt, Austria. Discussing key issues of current relevance and setting the tone for future research in world Englishes, this book provides new perspectives on the diverse realities of Englishes around the world. Written by an international team of established and renowned scholars, it is the inaugural volume in the new series Bloomsbury Advances in World Englishes, dedicated to advancing research in the field. Chapters discuss important topics in contemporary world Englishes research, including de-colonial approaches, emerging varieties in post-protectorates and international uses as communicative events to highlight the globalizing aspect of English as a semiotic code. The book also expands on cultural conceptualizations to investigate the connections between Englishes and localized cultural knowledge and ongoing changes and attitudes towards local forms in multilingual settings. Closing with an examination of how world Englishes and the use of English as a lingua franca could influence the future teaching of Englishes, Research Developments in World Englishes presents a detailed picture of contemporary research approaches and points the way towards exciting future directions

    Multiple Agreement Constructions in Southern Italo-Romance. The Syntax of Sicilian Pseudo-Coordination

    Get PDF
    In the present thesis different configurations of Pseudo-Coordination are analysed. This is a monoclausal syntactic construction, formed by two finite verbs with an optional connector a between them (V1 a V2), which can be considered as an instance of the Multiple Agreement Constructions found in most southern Italo-Romance dialects. This thesis discusses the main parameters of micro-variation characterising the Pseudo-Coordination found in the Sicilian dialects: i) the criteria for the selection of the V1; ii) the Moods and the Tenses in which this construction can occur; iii) the criteria for the selection of the V2: iv) the hierarchy regulating the occurrence of the Persons (from 1sg to 3pl) in the different paradigms; v) the grammaticalisation of the V1 "go" with its phonetic erosion and desemanticisation. In the second part of the thesis, the first quantitative study dedicated to Pseudo-Coordination, conducted in Delia (Caltanissetta) with 70 participants during 2017, is presented

    Research Developments in World Englishes

    Get PDF
    This book is available as open access through the Bloomsbury Open Access programme and is available on www.bloomsburycollections.com. It is funded by the University of Klagenfurt, Austria. Discussing key issues of current relevance and setting the tone for future research in world Englishes, this book provides new perspectives on the diverse realities of Englishes around the world. Written by an international team of established and renowned scholars, it is the inaugural volume in the new series Bloomsbury Advances in World Englishes, dedicated to advancing research in the field. Chapters discuss important topics in contemporary world Englishes research, including de-colonial approaches, emerging varieties in post-protectorates and international uses as communicative events to highlight the globalizing aspect of English as a semiotic code. The book also expands on cultural conceptualizations to investigate the connections between Englishes and localized cultural knowledge and ongoing changes and attitudes towards local forms in multilingual settings. Closing with an examination of how world Englishes and the use of English as a lingua franca could influence the future teaching of Englishes, Research Developments in World Englishes presents a detailed picture of contemporary research approaches and points the way towards exciting future directions

    Ethnographic monitoring and the study of complexity

    Get PDF
    In this chapter, we explore the value of long-term fieldwork in the context of ever-increasing complexity in social life. This complexity stems from the phenomenon of ‘superdiversity’(Vertovec, 2007) and the effects of globalization. These effects are visible in the contact between languages and cultures, which has spawned a range of new language-cultural phenomena. Sociolinguists and ethnographers concerned with superdiversity argue that the concepts of language and culture themselves, as separate, bounded entities, have become highly problematic and now invite new methodological approaches (Blommaert & Rampton, 2011). Linguistic and cultural change is the rule and not the exception

    Accessibility at Film Festivals: Guidelines for Inclusive Subtitling

    Get PDF
    In today's media-dominated world, the imperative for accessibility has never been greater, and ensuring that audiovisual experiences cater to individuals with sensory disabilities has become a pressing concern. One of the key initiatives in this endeavour is inclusive subtitling (IS), a practice rooted in the broader contexts of subtitling for the deaf and hard of hearing (SDH/CC), audiovisual translation studies (AVTS), media accessibility studies (MAS), and the evolving field of Deaf studies (DS). This study aims to offer a comprehensive exploration of how inclusive subtitling contributes to fostering accessible and inclusive audiovisual experiences, with a particular focus on its implications within the unique environment of film festivals. To gain a holistic perspective of inclusive subtitling, it is essential to examine its lineage in relation to analogous practices, which is the focus of the first chapter. Inclusive subtitling is an extension of SDH/CC, designed for individuals with hearing impairments, and SDH/CC, in turn, is a nuanced variation of traditional subtitling extensively explored within the realm of AVTS. To encapsulate the diverse techniques and modalities aimed at making audiovisual content universally accessible, the study recognises the term "Audiovisual Accessibility" (AVA). The second chapter explores the interconnection of accessibility studies (AS), AVTS, and MAS, highlighting their symbiotic relationship and their role in framing inclusive subtitles within these fields. These interconnections are pivotal in shaping a framework for the practice of inclusive subtitling, enabling a comprehensive examination of its applicability and research implications. The third chapter delves into Deaf studies and the evolution of Deafhood, which hinges on the history and culture of Deaf individuals. This chapter elucidates the distinction between ‘deafness’ as a medical construct and ‘Deafhood’ as a cultural identity, crucial to the understanding of audiovisual accessibility and its intersection with the Deaf community's perspectives. In the fourth chapter, the focus turns to the exploration of film festivals, with a specific emphasis on the crucial role of subtitles in enhancing accessibility, particularly when films are presented in their original languages. The chapter marks a critical point, highlighting the inherent connection between subtitles and the immersive nature of film festivals that aspire to promote inclusivity in the cinematic experience. The emphasis on inclusivity extends to the evolution of film festivals, giving rise to more advanced forms, including accessible film festivals and Deaf film festivals. At the core of the chapter is a thorough examination of the corpus, specifically, the SDH/CC of films spanning the editions from 2020 to 2023 of two highly significant film festivals, namely BFI Flare and the London Film Festival. The corpus serves as the foundation upon which my research unfolds, providing a nuanced understanding of the role subtitles play in film festival contexts. The main chapter, chapter five, thoroughly analyses the technical and linguistic aspects of inclusive subtitling, drawing insights from the Inclusive Subtitling Guidelines - a two version document devised by myself - and offering real-world applications supported by a case study at an Italian film festival and another case study of the short film Pure, with the relevant inclusive subtitles file annexed. In conclusion, the research sets the stage for a comprehensive exploration of inclusive subtitling's role in ensuring accessible and inclusive audiovisual experiences, particularly within film festivals. It underscores the importance of accessibility in the world of audiovisual media and highlights the need for inclusive practices to cater to diverse audiences
    corecore