580 research outputs found

    Natural language processing for similar languages, varieties, and dialects: A survey

    Get PDF
    There has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.Non peer reviewe

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Interpreting language-learning data

    Get PDF
    This book provides a forum for methodological discussions emanating from researchers engaged in studying how individuals acquire an additional language. Whereas publications in the field of second language acquisition generally report on empirical studies with relatively little space dedicated to questions of method, the current book gave authors the opportunity to more fully develop a discussion piece around a methodological issue in connection with the interpretation of language-learning data. The result is a set of seven thought-provoking contributions from researchers with diverse interests. Three main topics are addressed in these chapters: the role of native-speaker norms in second-language analyses, the impact of epistemological stance on experimental design and/or data interpretation, and the challenges of transcription and annotation of language-learning data, with a focus on data ambiguity. Authors expand on these crucial issues, reflect on best practices, and provide in many instances concrete examples of the impact they have on data interpretation

    Developing Attitudes toward Learning Arabic as a Foreign Language among American University and College Students

    Get PDF
    This study investigates the developing attitudes of American university and college students toward learning Arabic as a Foreign Language. The primary goal of this examination is to shed light on the ways in which students\u27 attitudes toward learning Arabic affect their motivation to learn the language, as well as their commitment to learning it. A secondary goal of this study is to reveal students\u27 perceptions of the use of both Spoken and Standard Arabic in the classroom, and what effect their perceptions may have on their developing attitudes toward Arabic, and their motivation to learn the language and study its culture. A self-report questionnaire was utilized, which was divided into three parts. The first part was designed to obtain background information and information regarding the participants’ Arabic learning experience. The second part was developed to obtain attitudinal perceptions toward Arabic language varieties and Arabic culture, as well as participants\u27 overall attitudes toward learning Arabic. This part of the questionnaire was designed to elicit information regarding the students’ attitudes prior to taking any Arabic classes, and their attitudes upon completion of at least one Arabic course. The findings revealed that a more positive perception toward learning Spoken Arabic was developed over the course of the Arabic language classes, however participants also reported less positive attitudes toward learning Modern Standard Arabic, along with a negative perception of the dominance of Modern Standard Arabic in the classroom. The findings also indicate that instrumental motivation is more important than any other type among students who continue in the program and take advanced Arabic

    Proceedings of the Conference on Natural Language Processing 2010

    Get PDF
    This book contains state-of-the-art contributions to the 10th conference on Natural Language Processing, KONVENS 2010 (Konferenz zur Verarbeitung natürlicher Sprache), with a focus on semantic processing. The KONVENS in general aims at offering a broad perspective on current research and developments within the interdisciplinary field of natural language processing. The central theme draws specific attention towards addressing linguistic aspects ofmeaning, covering deep as well as shallow approaches to semantic processing. The contributions address both knowledgebased and data-driven methods for modelling and acquiring semantic information, and discuss the role of semantic information in applications of language technology. The articles demonstrate the importance of semantic processing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting semantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval. The collection highlights the importance of semantic processing for different areas and applications in Natural Language Processing, and provides the reader with an overview of current research in this field

    Interpreting language-learning data

    Get PDF
    This book provides a forum for methodological discussions emanating from researchers engaged in studying how individuals acquire an additional language. Whereas publications in the field of second language acquisition generally report on empirical studies with relatively little space dedicated to questions of method, the current book gave authors the opportunity to more fully develop a discussion piece around a methodological issue in connection with the interpretation of language-learning data. The result is a set of seven thought-provoking contributions from researchers with diverse interests. Three main topics are addressed in these chapters: the role of native-speaker norms in second-language analyses, the impact of epistemological stance on experimental design and/or data interpretation, and the challenges of transcription and annotation of language-learning data, with a focus on data ambiguity. Authors expand on these crucial issues, reflect on best practices, and provide in many instances concrete examples of the impact they have on data interpretation

    Lexical and sociolinguistic variation in Qatari Arabic

    Get PDF
    This thesis embodies the result of an investigation into two linguistic variables: the (d3) and the (Q) in QD. The basic issue tackled is this: are variations observed in these variables rule governed? If so, are they linguistic or non-linguistic? A close examination of the data has shown that the variables are governed to a great extent by the class of lexical item containing the variaoles. Moreover they have demonstrated co-variation with paralinguistic factors such as social group membership, age, level of education and style. The social motivation for change and variation are highlighted. Such processes occur as a result of status-ranking of local social dialects and as a result of the tendency of the younger people to modify their speech in the direction of the superimposed variety, which is learnt at school. The impact of the process of modernization on linguistic change is also examined

    Arabic and Globalization:Understanding the Arab Voice

    Get PDF

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF
    corecore