13 research outputs found
From user-generated text to insight context-aware measurement of social impacts and interactions using natural language processing
Recent improvements in information and communication technologies have contributed to an increasingly globalized and connected world. The digital data that are created as the result of people's online activities and interactions consist of different types of personal and social information that can be used to extract and understand people's implicit or explicit beliefs, ideas, and biases. This thesis leverages methods and theories from natural language processing and social sciences to study and analyze the manifestations of various attributes and signals, namely social impacts, personal values, and moral traits, in user-generated texts. This work provides a comprehensive understanding of people's viewpoints, social values, and interactions and makes the following contributions.
First, we present a study that combines review mining and impact assessment to provide an extensive discussion on different types of impact that information products, namely documentary films, can have on people. We first establish a novel impact taxonomy and demonstrate that, with a rigorous analysis of user-generated texts and a theoretically grounded codebook, classification schema, and prediction model, we can detect multiple types of (self-reported) impact in texts and show that people's language can help in gaining insights about their opinions, socio-cultural information, and emotional states. Furthermore, the results of our analyses show that documentary films can shift peoples' perceptions and cognitions regarding different societal issues, e.g., climate change, and using a combination of informative features (linguistic, syntactic, and psychological), we can predict impact in sentences with high accuracy.
Second, we investigate the relationship between principles of human morality and the expression of stances in user-generated text data, namely tweets. More specifically, we first introduce and expand the Moral Foundations Dictionary and operationalize moral values to enhance the measurement of social effects. In addition, we provide detailed explanation on how morality and stance are associated in user-generated texts. Through extensive analysis, we show that discussions related to various social issues have distinctive moral and lexical profiles, and leveraging moral values as an additional feature can lead to measurable improvements in prediction accuracy of stance analysis.
Third, we utilize the representation of emotional and moral states in texts to study people's interactions in two different social networks. Moreover, we first expand the analysis of structural balance to include direction and multi-level balance assessment (triads, subgroups, and the whole network). Our results show that analyzing different levels of networks and using various linguistic cues can grant a more inclusive view of people and the stability of their interactions; we found that, unlike sentiments, moral statuses in discussions stay balanced throughout the networks even in the presence of tension.
Overall, this thesis aims to contribute to the emerging field of "social" NLP and broadens the scope of research in it by (1) utilizing a combination of novel taxonomies, datasets, and tools to examine user-generated texts and (2) providing more comprehensive insights about human language, cultures, and experiences
Exploring Moral Principles Exhibited in OSS: A Case Study on GitHub Heated Issues
To foster collaboration and inclusivity in Open Source Software (OSS)
projects, it is crucial to understand and detect patterns of toxic language
that may drive contributors away, especially those from underrepresented
communities. Although machine learning-based toxicity detection tools trained
on domain-specific data have shown promise, their design lacks an understanding
of the unique nature and triggers of toxicity in OSS discussions, highlighting
the need for further investigation. In this study, we employ Moral Foundations
Theory to examine the relationship between moral principles and toxicity in
OSS. Specifically, we analyze toxic communications in GitHub issue threads to
identify and understand five types of moral principles exhibited in text, and
explore their potential association with toxic behavior. Our preliminary
findings suggest a possible link between moral principles and toxic comments in
OSS communications, with each moral principle associated with at least one type
of toxicity. The potential of MFT in toxicity detection warrants further
investigation
The Evolution of Substance Use Coverage in the Philadelphia Inquirer
The media's representation of illicit substance use can lead to harmful
stereotypes and stigmatization for individuals struggling with addiction,
ultimately influencing public perception, policy, and public health outcomes.
To explore how the discourse and coverage of illicit drug use changed over
time, this study analyzes 157,476 articles published in the Philadelphia
Inquirer over a decade. Specifically, the study focuses on articles that
mentioned at least one commonly abused substance, resulting in a sample of
3,903 articles. Our analysis shows that cannabis and narcotics are the most
frequently discussed classes of drugs. Hallucinogenic drugs are portrayed more
positively than other categories, whereas narcotics are portrayed the most
negatively. Our research aims to highlight the need for accurate and inclusive
portrayals of substance use and addiction in the media
People's Perceptions Toward Bias and Related Concepts in Large Language Models: A Systematic Review
Large language models (LLMs) have brought breakthroughs in tasks including
translation, summarization, information retrieval, and language generation,
gaining growing interest in the CHI community. Meanwhile, the literature shows
researchers' controversial perceptions about the efficacy, ethics, and
intellectual abilities of LLMs. However, we do not know how lay people perceive
LLMs that are pervasive in everyday tools, specifically regarding their
experience with LLMs around bias, stereotypes, social norms, or safety. In this
study, we conducted a systematic review to understand what empirical insights
papers have gathered about people's perceptions toward LLMs. From a total of
231 retrieved papers, we full-text reviewed 15 papers that recruited human
evaluators to assess their experiences with LLMs. We report different biases
and related concepts investigated by these studies, four broader LLM
application areas, the evaluators' perceptions toward LLMs' performances
including advantages, biases, and conflicting perceptions, factors influencing
these perceptions, and concerns about LLM applications
An Empirical Methodology for Detecting and Prioritizing Needs during Crisis Events
In times of crisis, identifying the essential needs is a crucial step to
providing appropriate resources and services to affected entities. Social media
platforms such as Twitter contain vast amount of information about the general
public's needs. However, the sparsity of the information as well as the amount
of noisy content present a challenge to practitioners to effectively identify
shared information on these platforms. In this study, we propose two novel
methods for two distinct but related needs detection tasks: the identification
of 1) a list of resources needed ranked by priority, and 2) sentences that
specify who-needs-what resources. We evaluated our methods on a set of tweets
about the COVID-19 crisis. For task 1 (detecting top needs), we compared our
results against two given lists of resources and achieved 64% precision. For
task 2 (detecting who-needs-what), we compared our results on a set of 1,000
annotated tweets and achieved a 68% F1-score