6 research outputs found

    Audience and the Use of Minority Languages on Twitter

    Get PDF
    On Twitter, many users tweet in more than one language. In this study, we examine the use of two Dutch minority languages. Users can engage with different audiences and by analyzing different types of tweets, we find that characteristics of the audience influence whether a minority language is used. Furthermore, while most tweets are written in Dutch, in conversations users often switch to the minority language

    Audience and the Use of Minority Languages on Twitter

    Get PDF
    On Twitter, many users tweet in more than one language. In this study, we examine the use of two Dutch minority languages. Users can engage with different audiences and by analyzing different types of tweets, we find that characteristics of the audience influence whether a minority language is used. Furthermore, while most tweets are written in Dutch, in conversations users often switch to the minority language

    Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case study

    Get PDF
    The occurrence of code-switching in online communication, when a writer switches among multiple languages, presents a challenge for natural language processing tools, since they are designed for texts written in a single language. To answer the challenge, this paper presents detailed research on ways to detect code-switching in Arabic text automatically. We compare the prediction by partial matching (PPM) compression-based classifier, implemented in Tawa, and a traditional machine learning classifier sequential minimal optimization (SMO), implemented in Waikato Environment for Knowledge Analysis, working specifically on Arabic text taken from Facebook. Three experiments were conducted in order to: (1) detect code-switching among the Egyptian dialect and English; (2) detect code-switching among the Egyptian dialect, the Saudi dialect, and English; and (3) detect code-switching among the Egyptian dialect, the Saudi dialect, Modern Standard Arabic (MSA), and English. Our experiments showed that PPM achieved a higher accuracy rate than SMO with 99.8% versus 97.5% in the first experiment and 97.8% versus 80.7% in the second. In the third experiment, PPM achieved a lower accuracy rate than SMO with 53.2% versus 60.2%. Code-switching between Egyptian Arabic and English text is easiest to detect because Arabic and English are generally written in different character sets. It is more difficult to distinguish between Arabic dialects and MSA as these use the same character set, and most users of Arabic, especially Saudis and Egyptians, frequently mix MSA with their dialects. We also note that the MSA corpus used for training the MSA model may not represent MSA Facebook text well, being built from news websites. This paper also describes in detail the new Arabic corpora created for this research and our experiments

    Twitter Users #CodeSwitch Hashtags! #MoltoImportante #wow

    No full text
    When code switching, individuals incor-porate elements of multiple languages into the same utterance. While code switching has been studied extensively in formal and spoken contexts, its behavior and preva-lence remains unexamined in many newer forms of electronic communication. The present study examines code switching in Twitter, focusing on instances where an author writes a post in one language and then includes a hashtag in a second lan-guage. In the first experiment, we per-form a large scale analysis on the lan-guages used in millions of posts to show that authors readily incorporate hashtags from other languages, and in a manual analysis of a subset the hashtags, reveal prolific code switching, with code switch-ing occurring for some hashtags in over twenty languages. In the second experi-ment, French and English posts from three bilingual cities are analyzed for their code switching frequency and its content.

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    #Languagemixing on Twitter

    Get PDF
    The influence of the English language on the world stage is such that it now constitutes a kind of global Lingua Franca. As such, English has supplanted French as the language of diplomacy, of culture, and of social prestige. This role reversal entails some residual opposition in France, and in consequence, the use of English expressions and vocabulary by French continues to be a controversial subject in France, as it has been for decades. Regulations are still being implemented to control the French language. Nowadays, social media has been an important tool in our society. Twitter has become a popular means of communication used in a variety of fields, such as politics, journalism, and academia. This widely used online platform has an impact on the way people express themselves and is changing language usage worldwide at an unprecedented pace. The language used online reflects the linguistic battle that has been going on for several decades in French society today. In my dissertation, I investigate the factors prompting the use of English and French language mixing on Twitter in France. The use of acronyms, hashtags as well as another language may be used as strategies to reach a wider audience. The need for visibility and audience maximization seem to be important factors for linguistic choice on Twitter. This study enables a deeper understanding of users' linguistic behavior online. The implications are important and allow for a rise in awareness of intercultural and cross-language exchanges.Includes bibliographical reference
    corecore