13 research outputs found

    From user-generated text to insight context-aware measurement of social impacts and interactions using natural language processing

    Get PDF
    Recent improvements in information and communication technologies have contributed to an increasingly globalized and connected world. The digital data that are created as the result of people's online activities and interactions consist of different types of personal and social information that can be used to extract and understand people's implicit or explicit beliefs, ideas, and biases. This thesis leverages methods and theories from natural language processing and social sciences to study and analyze the manifestations of various attributes and signals, namely social impacts, personal values, and moral traits, in user-generated texts. This work provides a comprehensive understanding of people's viewpoints, social values, and interactions and makes the following contributions. First, we present a study that combines review mining and impact assessment to provide an extensive discussion on different types of impact that information products, namely documentary films, can have on people. We first establish a novel impact taxonomy and demonstrate that, with a rigorous analysis of user-generated texts and a theoretically grounded codebook, classification schema, and prediction model, we can detect multiple types of (self-reported) impact in texts and show that people's language can help in gaining insights about their opinions, socio-cultural information, and emotional states. Furthermore, the results of our analyses show that documentary films can shift peoples' perceptions and cognitions regarding different societal issues, e.g., climate change, and using a combination of informative features (linguistic, syntactic, and psychological), we can predict impact in sentences with high accuracy. Second, we investigate the relationship between principles of human morality and the expression of stances in user-generated text data, namely tweets. More specifically, we first introduce and expand the Moral Foundations Dictionary and operationalize moral values to enhance the measurement of social effects. In addition, we provide detailed explanation on how morality and stance are associated in user-generated texts. Through extensive analysis, we show that discussions related to various social issues have distinctive moral and lexical profiles, and leveraging moral values as an additional feature can lead to measurable improvements in prediction accuracy of stance analysis. Third, we utilize the representation of emotional and moral states in texts to study people's interactions in two different social networks. Moreover, we first expand the analysis of structural balance to include direction and multi-level balance assessment (triads, subgroups, and the whole network). Our results show that analyzing different levels of networks and using various linguistic cues can grant a more inclusive view of people and the stability of their interactions; we found that, unlike sentiments, moral statuses in discussions stay balanced throughout the networks even in the presence of tension. Overall, this thesis aims to contribute to the emerging field of "social" NLP and broadens the scope of research in it by (1) utilizing a combination of novel taxonomies, datasets, and tools to examine user-generated texts and (2) providing more comprehensive insights about human language, cultures, and experiences

    Exploring Moral Principles Exhibited in OSS: A Case Study on GitHub Heated Issues

    Full text link
    To foster collaboration and inclusivity in Open Source Software (OSS) projects, it is crucial to understand and detect patterns of toxic language that may drive contributors away, especially those from underrepresented communities. Although machine learning-based toxicity detection tools trained on domain-specific data have shown promise, their design lacks an understanding of the unique nature and triggers of toxicity in OSS discussions, highlighting the need for further investigation. In this study, we employ Moral Foundations Theory to examine the relationship between moral principles and toxicity in OSS. Specifically, we analyze toxic communications in GitHub issue threads to identify and understand five types of moral principles exhibited in text, and explore their potential association with toxic behavior. Our preliminary findings suggest a possible link between moral principles and toxic comments in OSS communications, with each moral principle associated with at least one type of toxicity. The potential of MFT in toxicity detection warrants further investigation

    The Evolution of Substance Use Coverage in the Philadelphia Inquirer

    Full text link
    The media's representation of illicit substance use can lead to harmful stereotypes and stigmatization for individuals struggling with addiction, ultimately influencing public perception, policy, and public health outcomes. To explore how the discourse and coverage of illicit drug use changed over time, this study analyzes 157,476 articles published in the Philadelphia Inquirer over a decade. Specifically, the study focuses on articles that mentioned at least one commonly abused substance, resulting in a sample of 3,903 articles. Our analysis shows that cannabis and narcotics are the most frequently discussed classes of drugs. Hallucinogenic drugs are portrayed more positively than other categories, whereas narcotics are portrayed the most negatively. Our research aims to highlight the need for accurate and inclusive portrayals of substance use and addiction in the media

    People's Perceptions Toward Bias and Related Concepts in Large Language Models: A Systematic Review

    Full text link
    Large language models (LLMs) have brought breakthroughs in tasks including translation, summarization, information retrieval, and language generation, gaining growing interest in the CHI community. Meanwhile, the literature shows researchers' controversial perceptions about the efficacy, ethics, and intellectual abilities of LLMs. However, we do not know how lay people perceive LLMs that are pervasive in everyday tools, specifically regarding their experience with LLMs around bias, stereotypes, social norms, or safety. In this study, we conducted a systematic review to understand what empirical insights papers have gathered about people's perceptions toward LLMs. From a total of 231 retrieved papers, we full-text reviewed 15 papers that recruited human evaluators to assess their experiences with LLMs. We report different biases and related concepts investigated by these studies, four broader LLM application areas, the evaluators' perceptions toward LLMs' performances including advantages, biases, and conflicting perceptions, factors influencing these perceptions, and concerns about LLM applications

    An Empirical Methodology for Detecting and Prioritizing Needs during Crisis Events

    Full text link
    In times of crisis, identifying the essential needs is a crucial step to providing appropriate resources and services to affected entities. Social media platforms such as Twitter contain vast amount of information about the general public's needs. However, the sparsity of the information as well as the amount of noisy content present a challenge to practitioners to effectively identify shared information on these platforms. In this study, we propose two novel methods for two distinct but related needs detection tasks: the identification of 1) a list of resources needed ranked by priority, and 2) sentences that specify who-needs-what resources. We evaluated our methods on a set of tweets about the COVID-19 crisis. For task 1 (detecting top needs), we compared our results against two given lists of resources and achieved 64% precision. For task 2 (detecting who-needs-what), we compared our results on a set of 1,000 annotated tweets and achieved a 68% F1-score
    corecore