5 research outputs found
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Social Media Based Algorithmic Clinical Decision Support Learning from Behavioral Predispositions
Behavioral disorders are disabilities characterized by an individual’s mood, thinking, and social interactions. The commonality of behavioral disorders amongst the United States population has increased in the last few years, with an estimated 50% of all Americans diagnosed with a behavioral disorder at some point in their lifetime. AttentionDeficit/Hyperactivity Disorder is one such behavioral disorder that is a severe public health concern because of its high prevalence, incurable nature, significant impact on domestic life, and peer relationships. Symptomatically, in theory, ADHD is characterized by inattention, hyperactivity, and impulsivity. Access to providers who can offer diagnosis and treat the disorder varies by location.
The ever-increasing use of social media can be effectively employed in the diagnosis and treatment of the disorder. Study of behavior and in extension, the study of individuals with behavioral disorders is made easier through the uninhibited setting in which posts are created on social media platforms.
Outside the United States, diagnosis rates of the disorder are low, as it is mainly considered to be an American disorder. This impression was reinforced by the perception that the disorder is caused by social and cultural factors common to American society. However, in reality, the disorder can as quickly affect people of different races and cultures worldwide, but recognition of the disorder in the medical community has been slow. This may be due to its adverse impact on an individual, their families, and society.
This dissertation focuses on providing clinicians with a clinical decision support system to overcome the societal stigma associated with the disorder and to ensure the accurate and efficient diagnosis of individuals with the disorder. The results provided in this dissertation assist in the diagnosis of individuals with Attention Deficit Hyperactivity Disorder. Data for individuals with the disorder is collected through posts of self-reported diagnoses on Twitter using the Twitter API. Previous research has proved that there are differences in behavior before and after the diagnosis of the disorder. To capitalize on this, symptomatic differences of the disease before and after diagnosis are discovered and evaluated. The symptoms of the disorder, namely, inattention, hyperactivity, and impulsivity, are quantified using measures of sentiment and semantics. A separate group of users without the disorder, the control group, are collected for validation. The analysis poses a three-class classification problem, with the classes being pre-diagnosed, postdiagnosed, and control groups. Decision trees are used to force all possible outcomes in the semantic and sentiment differences in the three classes of users to create a clear delineation. Behavioral disorders diagnosed by a clinician are based on identifying whether a patient deviates from an identified normal. This is evaluated by answering a set list of questions that quantify behavior. To achieve the same without manual intervention, ease in interpretability - decision trees are chosen. Classification using a decision tree is on a tweetlevel and a user-level. Four cases are used both analyses: pre-diagnosed vs. post-diagnosed group, pre-diagnosed vs. control group, post-diagnosed vs. control group, and prediagnosed vs. post-diagnosed vs. control group.
The analysis on a user-level provides a higher degree of accuracy, with 93% accuracy for the case post-diagnosed vs. control group. The accuracy of the cases identifies the number of people who can be correctly classified into their respective groups. Low accuracy for the tweet-level results fortifies the opinion that the sparsity of information in tweet level analysis is a disadvantage. This is overcome by analyzing on a user level. The accuracy of the classifier can be further improved upon by the addition of features such as age and gender. The addition of these features may also be useful in predicting time to remission and peak of the disorder in future studies
Analyzing Connections Between User Attributes, Images, and Text
This work explores the relationship between a person’s demographic/ psychological traits (e.g., gender, personality) and selfidentity images and captions. We use a dataset of images and captions provided by N = 1,350 individuals, and we automatically extract features from both the images and captions. We identify several visual and textual properties that show reliable relationships with individual differences between participants. The automated techniques presented here allow us to draw interesting conclusions from our data that would be difficult to identify manually, and these techniques are extensible to other large datasets. We believe that our work on the relationship between user characteristics and user data has relevance in online settings, where users upload billions of images each day (Meeker M, 2014. Internet trends 2014–Code conference. Retrieved May 28, 2014)
Language Management in a Japanese Multinational Company: A Data-Driven Approach
Globalization poses a challenge for businesses with linguistically diverse staff,
prompting the choice of English as the default corporate language. In Japan,
research on the use of English in business contexts from both corporate and
employees' perspectives is very limited, let alone studies adopting a data-driven
approach. This study focuses on Rakuten, a Japanese multinational corporation
(MNC), with the aim of illustrating the key challenges the company faces when
it adopts English as its official language. The research is interdisciplinary
and is positioned at the intersection of business communication, computational
sociolinguistics, and language management. The first article, "Content analysis
of language-sensitive recruitment influenced by corporate language policy using
topic modeling", explores the match (or mismatch) between language-sensitive
recruitment (English, Japanese, or bilingual) and corporate language policy.
The second article, "It is all about TOEIC: discovering topics and trends m
employee perceptions of corporate language policy", examines the barriers m
multinational companies that have adopted a foreign language and analyzes
employees' attitudes. The third and final article, "Analyzing cultural expatriates'
attitude toward 'Englishnization' using dynamic topic modeling", investigates
changes in employee' perceptions of Japanese work practices and values over
time. The results of my study have implications for the implementation of
language-sensitive recruitment in a multilingual corporate context. Furthermore,
the thesis also highlights the evolutionary nature of corporate language policy topics
by exploring and categorizing large amounts of text. Overall, the results presented
in the three articles expand the understanding of the challenges associated with the
use of English in a Japanese busines