5 research outputs found

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Social Media Based Algorithmic Clinical Decision Support Learning from Behavioral Predispositions

    Get PDF
    Behavioral disorders are disabilities characterized by an individual’s mood, thinking, and social interactions. The commonality of behavioral disorders amongst the United States population has increased in the last few years, with an estimated 50% of all Americans diagnosed with a behavioral disorder at some point in their lifetime. AttentionDeficit/Hyperactivity Disorder is one such behavioral disorder that is a severe public health concern because of its high prevalence, incurable nature, significant impact on domestic life, and peer relationships. Symptomatically, in theory, ADHD is characterized by inattention, hyperactivity, and impulsivity. Access to providers who can offer diagnosis and treat the disorder varies by location. The ever-increasing use of social media can be effectively employed in the diagnosis and treatment of the disorder. Study of behavior and in extension, the study of individuals with behavioral disorders is made easier through the uninhibited setting in which posts are created on social media platforms. Outside the United States, diagnosis rates of the disorder are low, as it is mainly considered to be an American disorder. This impression was reinforced by the perception that the disorder is caused by social and cultural factors common to American society. However, in reality, the disorder can as quickly affect people of different races and cultures worldwide, but recognition of the disorder in the medical community has been slow. This may be due to its adverse impact on an individual, their families, and society. This dissertation focuses on providing clinicians with a clinical decision support system to overcome the societal stigma associated with the disorder and to ensure the accurate and efficient diagnosis of individuals with the disorder. The results provided in this dissertation assist in the diagnosis of individuals with Attention Deficit Hyperactivity Disorder. Data for individuals with the disorder is collected through posts of self-reported diagnoses on Twitter using the Twitter API. Previous research has proved that there are differences in behavior before and after the diagnosis of the disorder. To capitalize on this, symptomatic differences of the disease before and after diagnosis are discovered and evaluated. The symptoms of the disorder, namely, inattention, hyperactivity, and impulsivity, are quantified using measures of sentiment and semantics. A separate group of users without the disorder, the control group, are collected for validation. The analysis poses a three-class classification problem, with the classes being pre-diagnosed, postdiagnosed, and control groups. Decision trees are used to force all possible outcomes in the semantic and sentiment differences in the three classes of users to create a clear delineation. Behavioral disorders diagnosed by a clinician are based on identifying whether a patient deviates from an identified normal. This is evaluated by answering a set list of questions that quantify behavior. To achieve the same without manual intervention, ease in interpretability - decision trees are chosen. Classification using a decision tree is on a tweetlevel and a user-level. Four cases are used both analyses: pre-diagnosed vs. post-diagnosed group, pre-diagnosed vs. control group, post-diagnosed vs. control group, and prediagnosed vs. post-diagnosed vs. control group. The analysis on a user-level provides a higher degree of accuracy, with 93% accuracy for the case post-diagnosed vs. control group. The accuracy of the cases identifies the number of people who can be correctly classified into their respective groups. Low accuracy for the tweet-level results fortifies the opinion that the sparsity of information in tweet level analysis is a disadvantage. This is overcome by analyzing on a user level. The accuracy of the classifier can be further improved upon by the addition of features such as age and gender. The addition of these features may also be useful in predicting time to remission and peak of the disorder in future studies

    Analyzing Connections Between User Attributes, Images, and Text

    Get PDF
    This work explores the relationship between a person’s demographic/ psychological traits (e.g., gender, personality) and selfidentity images and captions. We use a dataset of images and captions provided by N = 1,350 individuals, and we automatically extract features from both the images and captions. We identify several visual and textual properties that show reliable relationships with individual differences between participants. The automated techniques presented here allow us to draw interesting conclusions from our data that would be difficult to identify manually, and these techniques are extensible to other large datasets. We believe that our work on the relationship between user characteristics and user data has relevance in online settings, where users upload billions of images each day (Meeker M, 2014. Internet trends 2014–Code conference. Retrieved May 28, 2014)

    Language Management in a Japanese Multinational Company: A Data-Driven Approach

    Get PDF
    Globalization poses a challenge for businesses with linguistically diverse staff, prompting the choice of English as the default corporate language. In Japan, research on the use of English in business contexts from both corporate and employees' perspectives is very limited, let alone studies adopting a data-driven approach. This study focuses on Rakuten, a Japanese multinational corporation (MNC), with the aim of illustrating the key challenges the company faces when it adopts English as its official language. The research is interdisciplinary and is positioned at the intersection of business communication, computational sociolinguistics, and language management. The first article, "Content analysis of language-sensitive recruitment influenced by corporate language policy using topic modeling", explores the match (or mismatch) between language-sensitive recruitment (English, Japanese, or bilingual) and corporate language policy. The second article, "It is all about TOEIC: discovering topics and trends m employee perceptions of corporate language policy", examines the barriers m multinational companies that have adopted a foreign language and analyzes employees' attitudes. The third and final article, "Analyzing cultural expatriates' attitude toward 'Englishnization' using dynamic topic modeling", investigates changes in employee' perceptions of Japanese work practices and values over time. The results of my study have implications for the implementation of language-sensitive recruitment in a multilingual corporate context. Furthermore, the thesis also highlights the evolutionary nature of corporate language policy topics by exploring and categorizing large amounts of text. Overall, the results presented in the three articles expand the understanding of the challenges associated with the use of English in a Japanese busines
    corecore